You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/10/06 10:15:49 UTC

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22654

    [SPARK-25660][SQL] Fix for the backward slash as CSV fields delimiter

    ## What changes were proposed in this pull request?
    
    The PR addresses the exception raised on accessing chars out of delimiter string. In particular, the backward slash `\` as the CSV fields delimiter causes the following exception on reading `abc\1`:
    ```Scala
    String index out of range: 1
    java.lang.StringIndexOutOfBoundsException: String index out of range: 1
    	at java.lang.String.charAt(String.java:658)
    ``` 
    because `str.charAt(1)` tries to access a char out of `str` in `CSVUtils.toChar`
    
    ## How was this patch tested?
    
    Added tests for empty string and string containing the backward slash to `CSVUtilsSuite`. Besides of that I added an end-to-end test to check how the backward slash is handled in reading CSV string with it.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 csv-slash-delim

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22654
    
----
commit dd16ca349dd20626b6a8ae00f6dd445c269104b8
Author: Maxim Gekk <ma...@...>
Date:   2018-10-06T09:12:33Z

    Test for backward slash as the delimiter

commit 7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4
Author: Maxim Gekk <ma...@...>
Date:   2018-10-06T09:55:15Z

    Bug fix + tests

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97306/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    @gatorsmile Could you look at it one more time, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97045/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r224799147
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
    @@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
         val df = spark.read.option("enforceSchema", false).csv(input)
         checkAnswer(df, Row("1", "2"))
       }
    +
    +  test("using the backward slash as the delimiter") {
    +    val input = Seq("""abc\1""").toDS()
    --- End diff --
    
    I prohibited single backslash and throw an exception with a tip of using double backslash.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223726757
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
    @@ -97,23 +97,22 @@ object CSVUtils {
        */
       @throws[IllegalArgumentException]
       def toChar(str: String): Char = {
    -    if (str.charAt(0) == '\\') {
    -      str.charAt(1)
    -      match {
    -        case 't' => '\t'
    -        case 'r' => '\r'
    -        case 'b' => '\b'
    -        case 'f' => '\f'
    -        case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
    -        case '\'' => '\''
    -        case 'u' if str == """\u0000""" => '\u0000'
    -        case _ =>
    -          throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
    -      }
    -    } else if (str.length == 1) {
    -      str.charAt(0)
    -    } else {
    -      throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
    +    (str: Seq[Char]) match {
    +      case Seq() => throw new IllegalArgumentException("Delimiter cannot be empty string")
    +      case Seq(c) => c
    --- End diff --
    
    I'm missing why we had to switch up the case statement like this. I get that we need to cover more cases, but there was duplication and now there is a bit more. What about ...
    
    ```
    str.length match {
      case 0 => // error
      case 1 => str(0)
      case 2 if str(0) == '\\' =>
        str(1) match {
          case c if """trbf"'\""".contains(c) => c
          case 'u' if str == """\u0000""" => '\0'
          case _ => // error
        }
      case _ => // error
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    LGTM
    
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22654


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97048/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97045/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223635442
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
    @@ -97,23 +97,21 @@ object CSVUtils {
        */
       @throws[IllegalArgumentException]
       def toChar(str: String): Char = {
    -    if (str.charAt(0) == '\\') {
    -      str.charAt(1)
    -      match {
    -        case 't' => '\t'
    -        case 'r' => '\r'
    -        case 'b' => '\b'
    -        case 'f' => '\f'
    -        case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
    -        case '\'' => '\''
    -        case 'u' if str == """\u0000""" => '\u0000'
    -        case _ =>
    -          throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
    -      }
    -    } else if (str.length == 1) {
    -      str.charAt(0)
    -    } else {
    -      throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
    +    (str: Seq[Char]) match {
    +      case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
    +      case Seq(c) => c
    +      case Seq('\\', 't') => '\t'
    +      case Seq('\\', 'r') => '\r'
    +      case Seq('\\', 'b') => '\b'
    +      case Seq('\\', 'f') => '\f'
    +      // In case user changes quote char and uses \" as delimiter in options
    +      case Seq('\\', '\"') => '\"'
    +      case Seq('\\', '\'') => '\''
    --- End diff --
    
    Added the case + a test for that


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97048/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223724099
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
    @@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
         val df = spark.read.option("enforceSchema", false).csv(input)
         checkAnswer(df, Row("1", "2"))
       }
    +
    +  test("using the backward slash as the delimiter") {
    +    val input = Seq("""abc\1""").toDS()
    --- End diff --
    
    Isn't `\` the default escape character? this should be read as the string "abc1" then, and not delimited. It would have to be `\\`, right? I'm not talking about Scala string escaping, but CSV here.
    
    Or is the point that delimiting takes precedence?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r224626882
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
    @@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
         val df = spark.read.option("enforceSchema", false).csv(input)
         checkAnswer(df, Row("1", "2"))
       }
    +
    +  test("using the backward slash as the delimiter") {
    +    val input = Seq("""abc\1""").toDS()
    --- End diff --
    
    If a user specified `\` as a delimiter, we should issue an exception message and let users add the escape symbol i.e., `\\`. It is weird to see both `\\` and `\` are representing the same thing `\`. We should be consistent for handling backslash in all the cases.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    @gatorsmile @srowen Thank you for your work.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97048/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223749951
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
    @@ -97,23 +97,22 @@ object CSVUtils {
        */
       @throws[IllegalArgumentException]
       def toChar(str: String): Char = {
    -    if (str.charAt(0) == '\\') {
    -      str.charAt(1)
    -      match {
    -        case 't' => '\t'
    -        case 'r' => '\r'
    -        case 'b' => '\b'
    -        case 'f' => '\f'
    -        case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
    -        case '\'' => '\''
    -        case 'u' if str == """\u0000""" => '\u0000'
    -        case _ =>
    -          throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
    -      }
    -    } else if (str.length == 1) {
    -      str.charAt(0)
    -    } else {
    -      throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
    +    (str: Seq[Char]) match {
    +      case Seq() => throw new IllegalArgumentException("Delimiter cannot be empty string")
    +      case Seq(c) => c
    --- End diff --
    
    I would prefer more declarative way and less nested levels of controls. but this is personal opinion. Let's look at your example:
    ```
    str.length
    ```
    you didn't check that str can be null. 
    
    ```
    case 2 if str(0) == '\\' =>
    case 'u' if str == """\u0000""" => '\0'
    ```
    If it has length 2, how `str` could be `"""\u0000"""`?
    
    ```
    case c if """trbf"'\""".contains(c) => c
    ```
    You should produce control chars not just second char. For example: `\t` -> Seq('\', 't') -> '\t`.
    
    In my approach, everything is simple. One input case is mapped to one output. There is no unnecessary complexity.  



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97306/testReport)** for PR 22654 at commit [`20856b4`](https://github.com/apache/spark/commit/20856b4132cbc6aa34484144112f3463e47c4906).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223579444
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
    @@ -97,23 +97,21 @@ object CSVUtils {
        */
       @throws[IllegalArgumentException]
       def toChar(str: String): Char = {
    -    if (str.charAt(0) == '\\') {
    -      str.charAt(1)
    -      match {
    -        case 't' => '\t'
    -        case 'r' => '\r'
    -        case 'b' => '\b'
    -        case 'f' => '\f'
    -        case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
    -        case '\'' => '\''
    -        case 'u' if str == """\u0000""" => '\u0000'
    -        case _ =>
    -          throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
    -      }
    -    } else if (str.length == 1) {
    -      str.charAt(0)
    -    } else {
    -      throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
    +    (str: Seq[Char]) match {
    +      case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
    --- End diff --
    
    Nit: remove `s`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223581611
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
    @@ -97,23 +97,21 @@ object CSVUtils {
        */
       @throws[IllegalArgumentException]
       def toChar(str: String): Char = {
    -    if (str.charAt(0) == '\\') {
    -      str.charAt(1)
    -      match {
    -        case 't' => '\t'
    -        case 'r' => '\r'
    -        case 'b' => '\b'
    -        case 'f' => '\f'
    -        case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
    -        case '\'' => '\''
    -        case 'u' if str == """\u0000""" => '\u0000'
    -        case _ =>
    -          throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
    -      }
    -    } else if (str.length == 1) {
    -      str.charAt(0)
    -    } else {
    -      throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
    +    (str: Seq[Char]) match {
    +      case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
    +      case Seq(c) => c
    +      case Seq('\\', 't') => '\t'
    +      case Seq('\\', 'r') => '\r'
    +      case Seq('\\', 'b') => '\b'
    +      case Seq('\\', 'f') => '\f'
    +      // In case user changes quote char and uses \" as delimiter in options
    +      case Seq('\\', '\"') => '\"'
    +      case Seq('\\', '\'') => '\''
    --- End diff --
    
    Regarding the backslash support, I think we should follow https://docs.oracle.com/javase/tutorial/java/data/characters.html 
    `case Seq('\\', '\\') => '\\'` 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223751904
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
    @@ -97,23 +97,22 @@ object CSVUtils {
        */
       @throws[IllegalArgumentException]
       def toChar(str: String): Char = {
    -    if (str.charAt(0) == '\\') {
    -      str.charAt(1)
    -      match {
    -        case 't' => '\t'
    -        case 'r' => '\r'
    -        case 'b' => '\b'
    -        case 'f' => '\f'
    -        case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
    -        case '\'' => '\''
    -        case 'u' if str == """\u0000""" => '\u0000'
    -        case _ =>
    -          throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
    -      }
    -    } else if (str.length == 1) {
    -      str.charAt(0)
    -    } else {
    -      throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
    +    (str: Seq[Char]) match {
    +      case Seq() => throw new IllegalArgumentException("Delimiter cannot be empty string")
    +      case Seq(c) => c
    --- End diff --
    
    Ah yeah good points. This is too clever, off the top of my head. I still wonder if the code here can reduce the duplication of `Seq('\\', c) => '\c'` but I don't see a way that actually works, yeah. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97153/testReport)** for PR 22654 at commit [`1c2ac25`](https://github.com/apache/spark/commit/1c2ac25c4c225ab63591976bbd0eec866ae40d06).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223623311
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
    @@ -97,23 +97,21 @@ object CSVUtils {
        */
       @throws[IllegalArgumentException]
       def toChar(str: String): Char = {
    -    if (str.charAt(0) == '\\') {
    -      str.charAt(1)
    -      match {
    -        case 't' => '\t'
    -        case 'r' => '\r'
    -        case 'b' => '\b'
    -        case 'f' => '\f'
    -        case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
    -        case '\'' => '\''
    -        case 'u' if str == """\u0000""" => '\u0000'
    -        case _ =>
    -          throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
    -      }
    -    } else if (str.length == 1) {
    -      str.charAt(0)
    -    } else {
    -      throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
    +    (str: Seq[Char]) match {
    +      case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
    --- End diff --
    
    removed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97045/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    @gatorsmile Please, take a look at this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97153/testReport)** for PR 22654 at commit [`1c2ac25`](https://github.com/apache/spark/commit/1c2ac25c4c225ab63591976bbd0eec866ae40d06).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22654#discussion_r223754099
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
    @@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
         val df = spark.read.option("enforceSchema", false).csv(input)
         checkAnswer(df, Row("1", "2"))
       }
    +
    +  test("using the backward slash as the delimiter") {
    +    val input = Seq("""abc\1""").toDS()
    --- End diff --
    
    > Or is the point that delimiting takes precedence?
    
    Right, if an user specified `\` as a delimiter, CSV parser considers it as the delimiter first of all. We can see that in the main loop that delimiters are handled before escape characters:
    https://github.com/uniVocity/univocity-parsers/blob/6746adc2ddb420ebba7441339887e4bbc35cf087/src/main/java/com/univocity/parsers/csv/CsvParser.java#L115


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    **[Test build #97306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97306/testReport)** for PR 22654 at commit [`20856b4`](https://github.com/apache/spark/commit/20856b4132cbc6aa34484144112f3463e47c4906).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22654
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97153/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org