You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/18 11:56:18 UTC
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/23080
[SPARK-26108][SQL] Support custom lineSep in CSV datasource
## What changes were proposed in this pull request?
In the PR, I propose new options for CSV datasource - `lineSep` similar to Text and JSON datasource. The option allows to specify custom line separator of maximum length of 2 characters (because of a restriction in `uniVocity` parser). New option can be used in reading and writing CSV files.
## How was this patch tested?
Added a few tests with custom `lineSep` for enabled/disabled `multiLine` in read as well as tests in write. Also I added roundtrip tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 csv-line-sep
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23080.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23080
----
commit a790bb30e575cf6d4ffaeda307f0405f1bfecf03
Author: Maxim Gekk <ma...@...>
Date: 2018-11-17T21:44:47Z
Added a test for default line separator
commit 7a47990af7a9e8782fbde2955c0cf6e4848a3806
Author: Maxim Gekk <ma...@...>
Date: 2018-11-17T21:56:34Z
Test for custom lineSep
commit be2870f1006c3f2e783cec0c40bd6e1c7e4c5652
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T09:59:07Z
Test on read
commit a058a6f2d6771173837ba4b6e829b2067993adb7
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T10:33:12Z
Support lineSep in write
commit 7e3c0264ae93e270ed8b63c53897a2b775fa65ff
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T10:36:17Z
Check roundtrip
commit 486b090139ce6d7a93a24edae000fb546b4931db
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T10:42:08Z
Test another char
commit a0fedbbb06f33716fc632d3b4dd2a687b2587966
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T11:03:20Z
Don't keep quotes
commit 5f013f505e7a57e4f72f6f1185f1dcdedc0960b5
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T11:13:38Z
Support 2 chars as lineSep
commit 65786dfabbb5c901e3f8d32f737a6b24a2f58b6b
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T11:14:22Z
Revert unrelated changes
commit 49b91ea06b757a2feed283de1634c36a59ace8f0
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T11:26:19Z
Test restrictions for lineSep
commit 12022ad1a0194a4bab9007d66145071562e066a4
Author: Maxim Gekk <ma...@...>
Date: 2018-11-18T11:39:12Z
Updating comments and docs
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r235244407
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -192,6 +192,20 @@ class CSVOptions(
*/
val emptyValueInWrite = emptyValue.getOrElse("\"\"")
+ /**
+ * A string between two consecutive JSON records.
+ */
+ val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
+ require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
+ require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
--- End diff --
Hm, I see.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99122/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #98982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98982/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5307/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99117/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5223/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #98979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98979/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99122/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #98982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98982/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98982/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99117/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by pooja-murarka <gi...@git.apache.org>.
Github user pooja-murarka commented on the issue:
https://github.com/apache/spark/pull/23080
I am testing **lineSep** with spark 2.4
data.csv : "a",1 "c",2 "d",3
val schema : StructType =
StructType(
Seq(
StructField(name = "dteday", dataType = StringType),
StructField(name = "hr", dataType = IntegerType)
)
_val logData = spark.read.format("csv").schema(schema).option("lineSep", "\t").load("data.csv")_
But can only see schema without any data.
scala> logData.show()
+------+----+
|dteday| hr|
+------+----+
| null|null|
+------+----+
Can you please suggest if i missed something or above fix has not been merged with branch.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99181/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
Ah, also, `CsvParser.beginParsing` takes an additional argument `Charset`. It should rather be easily able to support encoding in `multiLine`. @MaxGekk, would you be able to find some time to work on it? If that change can make the current PR easier. we can merge that one first.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99122/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r234475228
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -192,6 +192,20 @@ class CSVOptions(
*/
val emptyValueInWrite = emptyValue.getOrElse("\"\"")
+ /**
+ * A string between two consecutive JSON records.
+ */
+ val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
+ require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
+ require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
--- End diff --
@MaxGekk, might not be a super big deal but I believe this should be counted after converting it into `UTF-8`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99181/testReport)** for PR 23080 at commit [`918d163`](https://github.com/apache/spark/commit/918d163541cb54e37b7ddc4fc337a299343fc31d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
Last changes were only doc changes. Let me get this in.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
It's fixed in upcoming Spark. Spark 2.4 does not support it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/23080
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r235589426
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -216,8 +232,13 @@ class CSVOptions(
format.setDelimiter(delimiter)
format.setQuote(quote)
format.setQuoteEscape(escape)
+ lineSeparator.foreach {sep =>
+ format.setLineSeparator(sep)
+ format.setNormalizedNewline(0x00.toChar)
--- End diff --
I know we have some problems here for setting newlines more then 1 character because `setNormalizedNewline` only supports one character.
This is related with https://github.com/apache/spark/pull/18581#issuecomment-314037750 and https://github.com/uniVocity/univocity-parsers/issues/170
That's why I thought we can only support this for single character for now.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r235830894
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
@@ -377,6 +377,8 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
* <li>`multiLine` (default `false`): parse one record, which may span multiple lines.</li>
* <li>`locale` (default is `en-US`): sets a locale as language tag in IETF BCP 47 format.
* For instance, this is used while parsing dates and timestamps.</li>
+ * <li>`lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the line separator
+ * that should be used for parsing. Maximum length is 2.</li>
--- End diff --
I'm sorry. can you fix `Maximum length is 2` as well? should be good to go.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
LGTM except https://github.com/apache/spark/pull/23080/files#r235589426
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/23080
jenkins, retest this, please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r235634152
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -216,8 +232,13 @@ class CSVOptions(
format.setDelimiter(delimiter)
format.setQuote(quote)
format.setQuoteEscape(escape)
+ lineSeparator.foreach {sep =>
+ format.setLineSeparator(sep)
+ format.setNormalizedNewline(0x00.toChar)
--- End diff --
> That's why I thought we can only support this for single character for now.
ok. I will restrict line separators by one character.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/23080
jenkins, retest this, please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #98980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98980/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99181/testReport)** for PR 23080 at commit [`918d163`](https://github.com/apache/spark/commit/918d163541cb54e37b7ddc4fc337a299343fc31d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99215/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #98980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98980/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99105/testReport)** for PR 23080 at commit [`bb8a13b`](https://github.com/apache/spark/commit/bb8a13b8c1e7e6b8848eec1693a46e35e2a86e2f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99215/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99117/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5122/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98979/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99105/testReport)** for PR 23080 at commit [`bb8a13b`](https://github.com/apache/spark/commit/bb8a13b8c1e7e6b8848eec1693a46e35e2a86e2f).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/23080
> would you be able to find some time to work on it? If that change can make the current PR easier. we can merge that one first.
I will try
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98980/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #98979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98979/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99210/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5280/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r234551573
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -192,6 +192,20 @@ class CSVOptions(
*/
val emptyValueInWrite = emptyValue.getOrElse("\"\"")
+ /**
+ * A string between two consecutive JSON records.
+ */
+ val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
+ require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
+ require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
--- End diff --
`uniVocity` parser checks number of chars, see https://github.com/uniVocity/univocity-parsers/blob/f616d151b48150bc9cb98943f9b6f8353b704359/src/main/java/com/univocity/parsers/common/Format.java#L120-L122
and those chars are in `UTF-16`, I guess.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5228/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99215/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/23080
@HyukjinKwon Could you look at the PR, please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
@MaxGekk, let's rebase this one accordingly with encoding support.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99105/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r234476318
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -192,6 +192,20 @@ class CSVOptions(
*/
val emptyValueInWrite = emptyValue.getOrElse("\"\"")
+ /**
+ * A string between two consecutive JSON records.
+ */
+ val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
+ require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
+ require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
+ sep
+ }
+
+ val lineSeparatorInRead: Option[Array[Byte]] = lineSeparator.map { lineSep =>
+ lineSep.getBytes("UTF-8")
--- End diff --
@MaxGekk, CSV's multiline does not support encoding but I think normal mode supports `encoding`. It should be okay to get bytes from it. We can just throw an exception when multiline is enabled.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5123/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5219/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99210/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23080
**[Test build #99210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99210/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/23080
jenkins, retest this, please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r235589448
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -227,7 +248,10 @@ class CSVOptions(
settings.setEmptyValue(emptyValueInRead)
settings.setMaxCharsPerColumn(maxCharsPerColumn)
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
- settings.setLineSeparatorDetectionEnabled(multiLine == true)
+ settings.setLineSeparatorDetectionEnabled(lineSeparatorInRead.isEmpty && multiLine)
+ lineSeparatorInRead.foreach { _ =>
--- End diff --
nice!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23080#discussion_r234475595
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
@@ -192,6 +192,20 @@ class CSVOptions(
*/
val emptyValueInWrite = emptyValue.getOrElse("\"\"")
+ /**
+ * A string between two consecutive JSON records.
+ */
+ val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
+ require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
+ require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
--- End diff --
We could say the line separator should be 1 or 2 bytes (UTF-8) in read path specifically.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5125/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23080
@MaxGekk, thanks for working on this one.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5304/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org