You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/10/06 10:15:49 UTC
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22654
[SPARK-25660][SQL] Fix for the backward slash as CSV fields delimiter
## What changes were proposed in this pull request?
The PR addresses the exception raised on accessing chars out of delimiter string. In particular, the backward slash `\` as the CSV fields delimiter causes the following exception on reading `abc\1`:
```Scala
String index out of range: 1
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(String.java:658)
```
because `str.charAt(1)` tries to access a char out of `str` in `CSVUtils.toChar`
## How was this patch tested?
Added tests for empty string and string containing the backward slash to `CSVUtilsSuite`. Besides of that I added an end-to-end test to check how the backward slash is handled in reading CSV string with it.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 csv-slash-delim
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22654.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22654
----
commit dd16ca349dd20626b6a8ae00f6dd445c269104b8
Author: Maxim Gekk <ma...@...>
Date: 2018-10-06T09:12:33Z
Test for backward slash as the delimiter
commit 7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4
Author: Maxim Gekk <ma...@...>
Date: 2018-10-06T09:55:15Z
Bug fix + tests
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97306/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22654
@gatorsmile Could you look at it one more time, please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97045/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r224799147
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
@@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
val df = spark.read.option("enforceSchema", false).csv(input)
checkAnswer(df, Row("1", "2"))
}
+
+ test("using the backward slash as the delimiter") {
+ val input = Seq("""abc\1""").toDS()
--- End diff --
I prohibited single backslash and throw an exception with a tip of using double backslash.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223726757
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
@@ -97,23 +97,22 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException("Delimiter cannot be empty string")
+ case Seq(c) => c
--- End diff --
I'm missing why we had to switch up the case statement like this. I get that we need to cover more cases, but there was duplication and now there is a bit more. What about ...
```
str.length match {
case 0 => // error
case 1 => str(0)
case 2 if str(0) == '\\' =>
str(1) match {
case c if """trbf"'\""".contains(c) => c
case 'u' if str == """\u0000""" => '\0'
case _ => // error
}
case _ => // error
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22654
LGTM
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22654
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97048/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97045/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223635442
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
@@ -97,23 +97,21 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
+ case Seq(c) => c
+ case Seq('\\', 't') => '\t'
+ case Seq('\\', 'r') => '\r'
+ case Seq('\\', 'b') => '\b'
+ case Seq('\\', 'f') => '\f'
+ // In case user changes quote char and uses \" as delimiter in options
+ case Seq('\\', '\"') => '\"'
+ case Seq('\\', '\'') => '\''
--- End diff --
Added the case + a test for that
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97048/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223724099
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
@@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
val df = spark.read.option("enforceSchema", false).csv(input)
checkAnswer(df, Row("1", "2"))
}
+
+ test("using the backward slash as the delimiter") {
+ val input = Seq("""abc\1""").toDS()
--- End diff --
Isn't `\` the default escape character? this should be read as the string "abc1" then, and not delimited. It would have to be `\\`, right? I'm not talking about Scala string escaping, but CSV here.
Or is the point that delimiting takes precedence?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r224626882
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
@@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
val df = spark.read.option("enforceSchema", false).csv(input)
checkAnswer(df, Row("1", "2"))
}
+
+ test("using the backward slash as the delimiter") {
+ val input = Seq("""abc\1""").toDS()
--- End diff --
If a user specified `\` as a delimiter, we should issue an exception message and let users add the escape symbol i.e., `\\`. It is weird to see both `\\` and `\` are representing the same thing `\`. We should be consistent for handling backslash in all the cases.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22654
@gatorsmile @srowen Thank you for your work.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22654
jenkins, retest this, please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97048/testReport)** for PR 22654 at commit [`7bf453a`](https://github.com/apache/spark/commit/7bf453a5a5ddedda4319d114bbbb6b8c125fa0d4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223749951
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
@@ -97,23 +97,22 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException("Delimiter cannot be empty string")
+ case Seq(c) => c
--- End diff --
I would prefer more declarative way and less nested levels of controls. but this is personal opinion. Let's look at your example:
```
str.length
```
you didn't check that str can be null.
```
case 2 if str(0) == '\\' =>
case 'u' if str == """\u0000""" => '\0'
```
If it has length 2, how `str` could be `"""\u0000"""`?
```
case c if """trbf"'\""".contains(c) => c
```
You should produce control chars not just second char. For example: `\t` -> Seq('\', 't') -> '\t`.
In my approach, everything is simple. One input case is mapped to one output. There is no unnecessary complexity.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97306/testReport)** for PR 22654 at commit [`20856b4`](https://github.com/apache/spark/commit/20856b4132cbc6aa34484144112f3463e47c4906).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223579444
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
@@ -97,23 +97,21 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
--- End diff --
Nit: remove `s`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223581611
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
@@ -97,23 +97,21 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
+ case Seq(c) => c
+ case Seq('\\', 't') => '\t'
+ case Seq('\\', 'r') => '\r'
+ case Seq('\\', 'b') => '\b'
+ case Seq('\\', 'f') => '\f'
+ // In case user changes quote char and uses \" as delimiter in options
+ case Seq('\\', '\"') => '\"'
+ case Seq('\\', '\'') => '\''
--- End diff --
Regarding the backslash support, I think we should follow https://docs.oracle.com/javase/tutorial/java/data/characters.html
`case Seq('\\', '\\') => '\\'`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223751904
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
@@ -97,23 +97,22 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException("Delimiter cannot be empty string")
+ case Seq(c) => c
--- End diff --
Ah yeah good points. This is too clever, off the top of my head. I still wonder if the code here can reduce the duplication of `Seq('\\', c) => '\c'` but I don't see a way that actually works, yeah.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97153/testReport)** for PR 22654 at commit [`1c2ac25`](https://github.com/apache/spark/commit/1c2ac25c4c225ab63591976bbd0eec866ae40d06).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223623311
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ---
@@ -97,23 +97,21 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \" as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException(s"Delimiter cannot be empty string")
--- End diff --
removed
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97045/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22654
@gatorsmile Please, take a look at this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97153/testReport)** for PR 22654 at commit [`1c2ac25`](https://github.com/apache/spark/commit/1c2ac25c4c225ab63591976bbd0eec866ae40d06).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223754099
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
@@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
val df = spark.read.option("enforceSchema", false).csv(input)
checkAnswer(df, Row("1", "2"))
}
+
+ test("using the backward slash as the delimiter") {
+ val input = Seq("""abc\1""").toDS()
--- End diff --
> Or is the point that delimiting takes precedence?
Right, if an user specified `\` as a delimiter, CSV parser considers it as the delimiter first of all. We can see that in the main loop that delimiters are handled before escape characters:
https://github.com/uniVocity/univocity-parsers/blob/6746adc2ddb420ebba7441339887e4bbc35cf087/src/main/java/com/univocity/parsers/csv/CsvParser.java#L115
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22654
**[Test build #97306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97306/testReport)** for PR 22654 at commit [`20856b4`](https://github.com/apache/spark/commit/20856b4132cbc6aa34484144112f3463e47c4906).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22654
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97153/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org