You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2018/05/24 14:19:18 UTC
spark git commit: [SPARK-24329][SQL] Test for skipping multi-space
lines
Repository: spark
Updated Branches:
refs/heads/master 3469f5c98 -> 13bedc05c
[SPARK-24329][SQL] Test for skipping multi-space lines
## What changes were proposed in this pull request?
The PR is a continue of https://github.com/apache/spark/pull/21380 . It checks cases that are handled by the code:
https://github.com/apache/spark/blob/e3de6ab30d52890eb08578e55eb4a5d2b4e7aa35/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala#L303-L304
Basically the code skips lines with one or many whitespaces, and lines with comments (see [filterCommentAndEmpty](https://github.com/apache/spark/blob/e3de6ab30d52890eb08578e55eb4a5d2b4e7aa35/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala#L47))
```scala
iter.filter { line =>
line.trim.nonEmpty && !line.startsWith(options.comment.toString)
}
```
Closes #21380
## How was this patch tested?
Added a test for the case described above.
Author: Maxim Gekk <ma...@databricks.com>
Author: Maxim Gekk <ma...@gmail.com>
Closes #21394 from MaxGekk/test-for-multi-space-lines.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/13bedc05
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/13bedc05
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/13bedc05
Branch: refs/heads/master
Commit: 13bedc05c28fcc6e739fb472bd2ee3035fa11648
Parents: 3469f5c
Author: Maxim Gekk <ma...@databricks.com>
Authored: Thu May 24 22:18:58 2018 +0800
Committer: hyukjinkwon <gu...@apache.org>
Committed: Thu May 24 22:18:58 2018 +0800
----------------------------------------------------------------------
.../resources/test-data/comments-whitespaces.csv | 8 ++++++++
.../sql/execution/datasources/csv/CSVSuite.scala | 15 +++++++++++++++
2 files changed, 23 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/13bedc05/sql/core/src/test/resources/test-data/comments-whitespaces.csv
----------------------------------------------------------------------
diff --git a/sql/core/src/test/resources/test-data/comments-whitespaces.csv b/sql/core/src/test/resources/test-data/comments-whitespaces.csv
new file mode 100644
index 0000000..2737978
--- /dev/null
+++ b/sql/core/src/test/resources/test-data/comments-whitespaces.csv
@@ -0,0 +1,8 @@
+# The file contains comments, whitespaces and empty lines
+colA
+# empty line
+
+# the line with a few whitespaces
+
+# int value with leading and trailing whitespaces
+ "a"
http://git-wip-us.apache.org/repos/asf/spark/blob/13bedc05/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 07e6c74..2bac1a3 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -1368,4 +1368,19 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
checkAnswer(computed, expected)
}
}
+
+ test("SPARK-24329: skip lines with comments, and one or multiple whitespaces") {
+ val schema = new StructType().add("colA", StringType)
+ val ds = spark
+ .read
+ .schema(schema)
+ .option("multiLine", false)
+ .option("header", true)
+ .option("comment", "#")
+ .option("ignoreLeadingWhiteSpace", false)
+ .option("ignoreTrailingWhiteSpace", false)
+ .csv(testFile("test-data/comments-whitespaces.csv"))
+
+ checkAnswer(ds, Seq(Row(""" "a" """)))
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org