You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/06/12 22:46:21 UTC

[GitHub] spark pull request #17702: [SPARK-20408][SQL] Get the glob path in parallel ...

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17702#discussion_r194911415
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala ---
    @@ -252,6 +252,18 @@ class SparkHadoopUtil extends Logging {
         if (isGlobPath(pattern)) globPath(fs, pattern) else Seq(pattern)
       }
     
    +  def expandGlobPath(fs: FileSystem, pattern: Path): Seq[String] = {
    +    val arr = pattern.toString.split("/")
    --- End diff --
    
    we should not parse the path string ourselves, it's too risky, we may miss some special cases like windows path, escape character, etc. Let's take a look at `org.apache.hadoop.fs.Globber` and see if we can reuse some parser API there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org