You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2018/03/05 22:20:14 UTC
spark git commit: [SPARK-23434][SQL][BRANCH-2.3] Spark should not
warn `metadata directory` for a HDFS file path
Repository: spark
Updated Branches:
refs/heads/branch-2.3 911b83da4 -> b9ea2e87b
[SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `metadata directory` for a HDFS file path
## What changes were proposed in this pull request?
In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it warns with a wrong warning message during looking up `people.json/_spark_metadata`. The root cause of this situation is the difference between `LocalFileSystem` and `DistributedFileSystem`. `LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` raises `org.apache.hadoop.security.AccessControlException`.
```scala
scala> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
```
After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
```
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <do...@apache.org>
Closes #20713 from dongjoon-hyun/SPARK-23434-2.3.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9ea2e87
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9ea2e87
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9ea2e87
Branch: refs/heads/branch-2.3
Commit: b9ea2e87bb24c3731bd2dbd044d10d18dbbf9c6f
Parents: 911b83d
Author: Dongjoon Hyun <do...@apache.org>
Authored: Mon Mar 5 14:20:10 2018 -0800
Committer: Wenchen Fan <we...@databricks.com>
Committed: Mon Mar 5 14:20:10 2018 -0800
----------------------------------------------------------------------
.../spark/sql/execution/streaming/FileStreamSink.scala | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/b9ea2e87/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
index 2715fa9..87a17ce 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
@@ -42,9 +42,11 @@ object FileStreamSink extends Logging {
try {
val hdfsPath = new Path(singlePath)
val fs = hdfsPath.getFileSystem(hadoopConf)
- val metadataPath = new Path(hdfsPath, metadataDir)
- val res = fs.exists(metadataPath)
- res
+ if (fs.isDirectory(hdfsPath)) {
+ fs.exists(new Path(hdfsPath, metadataDir))
+ } else {
+ false
+ }
} catch {
case NonFatal(e) =>
logWarning(s"Error while looking for metadata directory.")
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org