You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/18 11:56:18 UTC

[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23080

    [SPARK-26108][SQL] Support custom lineSep in CSV datasource

    ## What changes were proposed in this pull request?
    
    In the PR,  I propose new options for CSV datasource - `lineSep` similar to Text and JSON datasource. The option allows to specify custom line separator of maximum length of 2 characters (because of a restriction in `uniVocity` parser). New option can be used in reading and writing CSV files.  
    
    ## How was this patch tested?
    
    Added a few tests with custom `lineSep` for enabled/disabled `multiLine` in read as well as tests in write. Also I added roundtrip tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 csv-line-sep

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23080
    
----
commit a790bb30e575cf6d4ffaeda307f0405f1bfecf03
Author: Maxim Gekk <ma...@...>
Date:   2018-11-17T21:44:47Z

    Added a test for default line separator

commit 7a47990af7a9e8782fbde2955c0cf6e4848a3806
Author: Maxim Gekk <ma...@...>
Date:   2018-11-17T21:56:34Z

    Test for custom lineSep

commit be2870f1006c3f2e783cec0c40bd6e1c7e4c5652
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T09:59:07Z

    Test on read

commit a058a6f2d6771173837ba4b6e829b2067993adb7
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T10:33:12Z

    Support lineSep in write

commit 7e3c0264ae93e270ed8b63c53897a2b775fa65ff
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T10:36:17Z

    Check roundtrip

commit 486b090139ce6d7a93a24edae000fb546b4931db
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T10:42:08Z

    Test another char

commit a0fedbbb06f33716fc632d3b4dd2a687b2587966
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T11:03:20Z

    Don't keep quotes

commit 5f013f505e7a57e4f72f6f1185f1dcdedc0960b5
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T11:13:38Z

    Support 2 chars as lineSep

commit 65786dfabbb5c901e3f8d32f737a6b24a2f58b6b
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T11:14:22Z

    Revert unrelated changes

commit 49b91ea06b757a2feed283de1634c36a59ace8f0
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T11:26:19Z

    Test restrictions for lineSep

commit 12022ad1a0194a4bab9007d66145071562e066a4
Author: Maxim Gekk <ma...@...>
Date:   2018-11-18T11:39:12Z

    Updating comments and docs

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r235244407
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -192,6 +192,20 @@ class CSVOptions(
        */
       val emptyValueInWrite = emptyValue.getOrElse("\"\"")
     
    +  /**
    +   * A string between two consecutive JSON records.
    +   */
    +  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
    +    require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
    +    require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
    --- End diff --
    
    Hm, I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99122/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #98982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98982/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5307/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99117/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5223/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #98979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98979/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99122/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #98982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98982/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98982/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99117/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by pooja-murarka <gi...@git.apache.org>.
Github user pooja-murarka commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    I am testing **lineSep** with spark 2.4
    
    data.csv : "a",1   "c",2   "d",3
    val schema : StructType =
              StructType(
            Seq(
              StructField(name = "dteday", dataType = StringType),
              StructField(name = "hr", dataType = IntegerType)
        )
    _val logData = spark.read.format("csv").schema(schema).option("lineSep", "\t").load("data.csv")_
    But can only see schema without any data.
    scala>     logData.show()
    +------+----+
    |dteday|  hr|
    +------+----+
    |  null|null|
    +------+----+
    
    Can you please suggest if i missed something or above fix has not been merged with branch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99181/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Ah, also, `CsvParser.beginParsing` takes an additional argument `Charset`. It should rather be easily able to support encoding in `multiLine`. @MaxGekk, would you be able to find some time to work on it? If that change can make the current PR easier. we can merge that one first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99122/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r234475228
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -192,6 +192,20 @@ class CSVOptions(
        */
       val emptyValueInWrite = emptyValue.getOrElse("\"\"")
     
    +  /**
    +   * A string between two consecutive JSON records.
    +   */
    +  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
    +    require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
    +    require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
    --- End diff --
    
    @MaxGekk, might not be a super big deal but I believe this should be counted after converting it into `UTF-8`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99181/testReport)** for PR 23080 at commit [`918d163`](https://github.com/apache/spark/commit/918d163541cb54e37b7ddc4fc337a299343fc31d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Last changes were only doc changes. Let me get this in.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    It's fixed in upcoming Spark. Spark 2.4 does not support it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23080


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r235589426
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -216,8 +232,13 @@ class CSVOptions(
         format.setDelimiter(delimiter)
         format.setQuote(quote)
         format.setQuoteEscape(escape)
    +    lineSeparator.foreach {sep =>
    +      format.setLineSeparator(sep)
    +      format.setNormalizedNewline(0x00.toChar)
    --- End diff --
    
    I know we have some problems here for setting newlines more then 1 character because `setNormalizedNewline` only supports one character. 
    
    This is related with https://github.com/apache/spark/pull/18581#issuecomment-314037750 and https://github.com/uniVocity/univocity-parsers/issues/170
    
    That's why I thought we can only support this for single character for now.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r235830894
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -377,6 +377,8 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
        * <li>`multiLine` (default `false`): parse one record, which may span multiple lines.</li>
        * <li>`locale` (default is `en-US`): sets a locale as language tag in IETF BCP 47 format.
        * For instance, this is used while parsing dates and timestamps.</li>
    +   * <li>`lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the line separator
    +   * that should be used for parsing. Maximum length is 2.</li>
    --- End diff --
    
    I'm sorry. can you fix `Maximum length is 2` as well? should be good to go.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    LGTM except https://github.com/apache/spark/pull/23080/files#r235589426


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r235634152
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -216,8 +232,13 @@ class CSVOptions(
         format.setDelimiter(delimiter)
         format.setQuote(quote)
         format.setQuoteEscape(escape)
    +    lineSeparator.foreach {sep =>
    +      format.setLineSeparator(sep)
    +      format.setNormalizedNewline(0x00.toChar)
    --- End diff --
    
    > That's why I thought we can only support this for single character for now.
    
    ok. I will restrict line separators by one character.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #98980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98980/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99181/testReport)** for PR 23080 at commit [`918d163`](https://github.com/apache/spark/commit/918d163541cb54e37b7ddc4fc337a299343fc31d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99215/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #98980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98980/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99105/testReport)** for PR 23080 at commit [`bb8a13b`](https://github.com/apache/spark/commit/bb8a13b8c1e7e6b8848eec1693a46e35e2a86e2f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99215/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99117/testReport)** for PR 23080 at commit [`1f5399f`](https://github.com/apache/spark/commit/1f5399f32a45fc7892cf5ce009b1a75221e844dd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5122/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98979/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99105/testReport)** for PR 23080 at commit [`bb8a13b`](https://github.com/apache/spark/commit/bb8a13b8c1e7e6b8848eec1693a46e35e2a86e2f).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    > would you be able to find some time to work on it? If that change can make the current PR easier. we can merge that one first.
    
    I will try


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98980/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #98979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98979/testReport)** for PR 23080 at commit [`12022ad`](https://github.com/apache/spark/commit/12022ad1a0194a4bab9007d66145071562e066a4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99210/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5280/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r234551573
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -192,6 +192,20 @@ class CSVOptions(
        */
       val emptyValueInWrite = emptyValue.getOrElse("\"\"")
     
    +  /**
    +   * A string between two consecutive JSON records.
    +   */
    +  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
    +    require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
    +    require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
    --- End diff --
    
    `uniVocity` parser checks number of chars, see https://github.com/uniVocity/univocity-parsers/blob/f616d151b48150bc9cb98943f9b6f8353b704359/src/main/java/com/univocity/parsers/common/Format.java#L120-L122
    
    and those chars are in `UTF-16`, I guess.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5228/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99215/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    @HyukjinKwon Could you look at the PR, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    @MaxGekk, let's rebase this one accordingly with encoding support.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99105/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r234476318
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -192,6 +192,20 @@ class CSVOptions(
        */
       val emptyValueInWrite = emptyValue.getOrElse("\"\"")
     
    +  /**
    +   * A string between two consecutive JSON records.
    +   */
    +  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
    +    require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
    +    require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
    +    sep
    +  }
    +
    +  val lineSeparatorInRead: Option[Array[Byte]] = lineSeparator.map { lineSep =>
    +    lineSep.getBytes("UTF-8")
    --- End diff --
    
    @MaxGekk, CSV's multiline does not support encoding but I think normal mode supports `encoding`. It should be okay to get bytes from it. We can just throw an exception when multiline is enabled.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5123/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5219/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99210/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    **[Test build #99210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99210/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r235589448
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -227,7 +248,10 @@ class CSVOptions(
         settings.setEmptyValue(emptyValueInRead)
         settings.setMaxCharsPerColumn(maxCharsPerColumn)
         settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
    -    settings.setLineSeparatorDetectionEnabled(multiLine == true)
    +    settings.setLineSeparatorDetectionEnabled(lineSeparatorInRead.isEmpty && multiLine)
    +    lineSeparatorInRead.foreach { _ =>
    --- End diff --
    
    nice!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23080#discussion_r234475595
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ---
    @@ -192,6 +192,20 @@ class CSVOptions(
        */
       val emptyValueInWrite = emptyValue.getOrElse("\"\"")
     
    +  /**
    +   * A string between two consecutive JSON records.
    +   */
    +  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
    +    require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
    +    require(sep.length <= 2, "'lineSep' can contain 1 or 2 characters.")
    --- End diff --
    
    We could say the line separator should be 1 or 2 bytes (UTF-8) in read path specifically.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5125/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    @MaxGekk, thanks for working on this one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23080
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5304/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org