You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2018/09/17 11:22:35 UTC
spark git commit: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in
HDFS path for loadtable command.
Repository: spark
Updated Branches:
refs/heads/master b66e14dc9 -> 619c94901
[SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable command.
What changes were proposed in this pull request
Updated the Migration guide for the behavior changes done in the JIRA issue SPARK-23425.
How was this patch tested?
Manually verified.
Closes #22396 from sujith71955/master_newtest.
Authored-by: s71955 <su...@gmail.com>
Signed-off-by: Wenchen Fan <we...@databricks.com>
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/619c9490
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/619c9490
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/619c9490
Branch: refs/heads/master
Commit: 619c949019feccd3fc2c9e58a841c655d05216f3
Parents: b66e14d
Author: s71955 <su...@gmail.com>
Authored: Mon Sep 17 19:22:27 2018 +0800
Committer: Wenchen Fan <we...@databricks.com>
Committed: Mon Sep 17 19:22:27 2018 +0800
----------------------------------------------------------------------
docs/sql-programming-guide.md | 1 +
.../spark/sql/hive/execution/SQLQuerySuite.scala | 15 +++++++++++++++
2 files changed, 16 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/619c9490/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 9da7d64..e262987 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
- Since Spark 2.4, File listing for compute statistics is done in parallel by default. This can be disabled by setting `spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
- Since Spark 2.4, Metadata files (e.g. Parquet summary files) and temporary files are not counted as data files when calculating table size during Statistics computation.
- Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In version 2.3 and earlier, empty strings are equal to `null` values and do not reflect to any characters in saved CSV files. For example, the row of `"a", null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as `a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to empty (not quoted) string.
+ - Since Spark 2.4, The LOAD DATA command supports wildcard `?` and `*`, which match any one character, and zero or more characters, respectively. Example: `LOAD DATA INPATH '/tmp/folder*/'` or `LOAD DATA INPATH '/tmp/part-?'`. Special Characters like `space` also now work in paths. Example: `LOAD DATA INPATH '/tmp/folder name/'`.
## Upgrading From Spark SQL 2.3.0 to 2.3.1 and above
http://git-wip-us.apache.org/repos/asf/spark/blob/619c9490/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
----------------------------------------------------------------------
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index 20c4c36..e49aea2 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -1916,6 +1916,21 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}
}
+ test("SPARK-23425 Test LOAD DATA LOCAL INPATH with space in file name") {
+ withTempDir { dir =>
+ val path = dir.toURI.toString.stripSuffix("/")
+ val dirPath = dir.getAbsoluteFile
+ for (i <- 1 to 3) {
+ Files.write(s"$i", new File(dirPath, s"part-r-0000 $i"), StandardCharsets.UTF_8)
+ }
+ withTable("load_t") {
+ sql("CREATE TABLE load_t (a STRING)")
+ sql(s"LOAD DATA LOCAL INPATH '$path/part-r-0000 1' INTO TABLE load_t")
+ checkAnswer(sql("SELECT * FROM load_t"), Seq(Row("1")))
+ }
+ }
+ }
+
test("Support wildcard character in folderlevel for LOAD DATA LOCAL INPATH") {
withTempDir { dir =>
val path = dir.toURI.toString.stripSuffix("/")
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org