You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by patrickotoole <gi...@git.apache.org> on 2014/05/11 08:07:03 UTC

[GitHub] spark pull request: SPARK-1795 - Add recursive directory file sear...

Github user patrickotoole commented on a diff in the pull request:

    https://github.com/apache/spark/pull/537#discussion_r12507593
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -327,18 +327,18 @@ class StreamingContext private[streaming] (
        * @param directory HDFS directory to monitor for new file
        * @param filter Function to filter paths to process
        * @param newFilesOnly Should process only new files and ignore existing files in the directory
    +   * @param recursive Should search through the directory recursively to find new files
        * @tparam K Key type for reading HDFS file
        * @tparam V Value type for reading HDFS file
        * @tparam F Input format for reading HDFS file
        */
       def fileStream[
    -    K: ClassTag,
    -    V: ClassTag,
    -    F <: NewInputFormat[K, V]: ClassTag
    -  ] (directory: String, filter: Path => Boolean, newFilesOnly: Boolean): InputDStream[(K, V)] = {
    -    new FileInputDStream[K, V, F](this, directory, filter, newFilesOnly)
    -  }
    -
    +      K: ClassTag,
    +      V: ClassTag,
    +      F <: NewInputFormat[K, V]: ClassTag
    +    ] (directory: String, filter: Path => Boolean, newFilesOnly: Boolean, recursive: Boolean): DStream[(K, V)] = {
    +      new FileInputDStream[K, V, F](this, directory, filter, newFilesOnly, recursive)
    +  } 
    --- End diff --
    
    I have included a default value on the FileInputDStream but not on the API itself. 
    
    Wondering if we want to introduce default values to the more granular version of the API. Currently, it looks like the exposed API essentially has two versions for these method -- one that assumes default values and one that exposes all the parameters of the DStream constructor.
    
    Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---