You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by emres <gi...@git.apache.org> on 2015/04/09 12:07:55 UTC

[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

GitHub user emres opened a pull request:

    https://github.com/apache/spark/pull/5438

    SPARK-3276 Added a new configuration spark.streaming.minRememberDuration

    SPARK-3276 Added a new configuration parameter ``spark.streaming.minRememberDuration``, with a default value of 1 minute.
    
    So that when a Spark Streaming application is started, an arbitrary number of minutes can be taken as threshold for remembering.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/emres/spark SPARK-3276

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5438
    
----
commit 43cc1cec08b649e3d748f4a4f470fca834b66bdb
Author: emres <em...@gmail.com>
Date:   2015-04-09T09:58:05Z

    SPARK-3276 Added a new configuration parameter spark.streaming.minRemember duration, with a default value of 1 minute.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92863531
  
    @srowen I see that [SPARK-5931](https://issues.apache.org/jira/browse/SPARK-5931) is marked as CLOSED (Fixed), and the associated Pull Request has been closed. How does that affect this pull request of mine? Anything else to be done?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92869106
  
    Yes, have a look at the changes. You can and should now create a property with a value like "60s" like the new properties in that PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-91182368
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-94933484
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-94433898
  
    @srowen fair enough. I wonder if @tdas will have time to check this PR. (We are already planning to use this feature for one of our existing customers, it'd be great if it became part of the official master branch.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-94267803
  
    I suspect this is OK. I wonder if we should simply leave this undocumented for now as it's somewhat obscure, rather than completely commit to it as an API. That might make this easier to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92914145
  
    @srowen I prepared another commit in which I'm now using the newly added ``getTimeAsSeconds`` method and passing the default value as "60s". Anything else to be done?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92338783
  
    Yes, I think you can change the prop name. I don't see a "minutes" designator, but might as well use seconds. I can sort of imagine wanting to specify a value less than 60 seconds, so the extra granularity might be beneficial. I don't think there are any special methods being added for specific properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28122332
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -331,19 +331,22 @@ private[streaming]
     object FileInputDStream {
     
       /**
    -   * Minimum duration of remembering the information of selected files. Files with mod times
    -   * older than this "window" of remembering will be ignored. So if new files are visible
    -   * within this window, then the file will get selected in the next batch.
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
        */
    -  private val MIN_REMEMBER_DURATION = Minutes(1)
    +  private val minRememberDurationMin = Minutes(new SparkConf()
    --- End diff --
    
    IIUC I think what @srowen mentions is to put `minRememberDurationMin` related code into class `FileInputDStream`, rather than in the object. By putting this into the class `FileInputDStream`, you could use conf from StreamingContext, no need to create a new one again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92337760
  
    @srowen fair enough. It looks like if [SPARK-5931](https://issues.apache.org/jira/browse/SPARK-5931) is resolved, I can go back to using ``spark.streaming.minRememberDuration`` as a Spark configuration property name, right? And I also think I will **have to** pass the value as "60s" (will there be a unit designation for 'minute's, or is it going to be only 's' for seconds, and 'ms' for milliseconds?)
    
    Do you foresee any other changes? E.g., will there be a new method to retrieve the value of the duration parameter such as ``spark.streaming.minRememberDuration``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28232809
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -80,6 +80,15 @@ class FileInputDStream[K, V, F <: NewInputFormat[K,V]](
     
       private val serializableConfOpt = conf.map(new SerializableWritable(_))
     
    +  /**
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
    +   */
    +  private val minRememberDurationMin = Minutes(ssc.conf
    --- End diff --
    
    The continuation indent looks a bit off here. Maybe just start the next line with `Minutes(...`.
    I think we might want to put this change on hold until SPARK-5931 is resolved, since it will give some proper syntax for time-based properties that you can use here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5438


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92920531
  
    That looks right. Still probably needs a nod from @tdas as it kinda adds a new API (config param)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-91205733
  
    @srowen I've pushed to my branch SPARK-3276 again. Is it acceptable now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by jhu-chang <gi...@git.apache.org>.
Github user jhu-chang commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-91484932
  
    This approach looks like affecting all the file source in one project, may be have some cases that parts of the file stream needs to include some old files, but others do not. Is a parameter more suitable for this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-93419072
  
    @srowen thanks for checking. @tdas does it look OK to you, too? 
    
    One thing I've realized is that I did not touch streaming programming guide, and I think I should document this. But after checking https://spark.apache.org/docs/latest/streaming-programming-guide.html, I could not be sure where to document this property. Where is the most appropriate place for documenting ``spark.streaming.minRememberDuration``? Should I put it in the table given at https://spark.apache.org/docs/latest/configuration.html#spark-streaming?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28226302
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -80,6 +80,16 @@ class FileInputDStream[K, V, F <: NewInputFormat[K,V]](
     
       private val serializableConfOpt = conf.map(new SerializableWritable(_))
     
    +  /**
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
    +   */
    +  private val minRememberDurationMin = Minutes(ssc.sparkContext.getConf
    +                                               .get("spark.streaming.minRememberDurationMin", "1")
    +                                               .toLong)
    --- End diff --
    
    I think you can use `ssc.conf` rather than `ssc.sparkContext.getConf`, also SparkConf has a API `getLong` which can be directly used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-91202293
  
    (You can just push to this PR rather than close it.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-94979442
  
    So, I think I got away with this one. I tested it locally and it passes, and I see that subsequent pull requests in Jenkins succeed, and ones that failed did so for unrelated reasons. So this does actually pass tests. Of course it was an unintentional mistake to proceed before seeing the Jenkins result on this one explicitly, but no actual harm done in this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by Sean Owen <so...@cloudera.com>.
Yes doesn't this need access to the actually configured conf object from
the context? This is just making a new conf object.
On Apr 9, 2015 8:08 AM, "emres" <gi...@git.apache.org> wrote:

> Github user emres commented on a diff in the pull request:
>
>     https://github.com/apache/spark/pull/5438#discussion_r28055824
>
>     --- Diff:
> streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
> ---
>     @@ -331,19 +331,22 @@ private[streaming]
>      object FileInputDStream {
>
>        /**
>     -   * Minimum duration of remembering the information of selected
> files. Files with mod times
>     -   * older than this "window" of remembering will be ignored. So if
> new files are visible
>     -   * within this window, then the file will get selected in the next
> batch.
>     +   * Minimum duration of remembering the information of selected
> files. Defaults to 1 minute.
>     +   *
>     +   * Files with mod times older than this "window" of remembering
> will be ignored. So if new
>     +   * files are visible within this window, then the file will get
> selected in the next batch.
>         */
>     -  private val MIN_REMEMBER_DURATION = Minutes(1)
>     +  private val minRememberDurationMin = Minutes(new SparkConf()
>     --- End diff --
>
>     @srowen I'm a little confused. Do you mean something like: "inside
> ``private[streaming] class FileInputDStream[K, V, F <:
> NewInputFormat[K,V]]`` class create a ``private val conf = new
> SparkConf()`` and then use that ``conf`` inside ``private[streaming] object
> FileInputDStream``"?
>
>     Or something totally different?
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
> For additional commands, e-mail: reviews-help@spark.apache.org
>
>

[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28055824
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -331,19 +331,22 @@ private[streaming]
     object FileInputDStream {
     
       /**
    -   * Minimum duration of remembering the information of selected files. Files with mod times
    -   * older than this "window" of remembering will be ignored. So if new files are visible
    -   * within this window, then the file will get selected in the next batch.
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
        */
    -  private val MIN_REMEMBER_DURATION = Minutes(1)
    +  private val minRememberDurationMin = Minutes(new SparkConf()
    --- End diff --
    
    @srowen I'm a little confused. Do you mean something like: "inside ``private[streaming] class FileInputDStream[K, V, F <: NewInputFormat[K,V]]`` class create a ``private val conf = new SparkConf()`` and then use that ``conf`` inside ``private[streaming] object FileInputDStream``"?
    
    Or something totally different?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28053037
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -331,11 +331,13 @@ private[streaming]
     object FileInputDStream {
     
       /**
    -   * Minimum duration of remembering the information of selected files. Files with mod times
    -   * older than this "window" of remembering will be ignored. So if new files are visible
    -   * within this window, then the file will get selected in the next batch.
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
        */
    -  private val MIN_REMEMBER_DURATION = Minutes(1)
    +  private val minRememberDuration = new SparkConf().get("spark.streaming.minRememberDuration", "1")
    --- End diff --
    
    Note https://issues.apache.org/jira/browse/SPARK-5931 here which would update time properties to be specifiable like "3s". It isn't updating property names, but we could update a new one. I suggest this property name include the units. 
    
    It looks like it's minutes here, so `spark.streaming.minRememberDurationMin`. Can it be seconds to allow finer granularity or is that unreasonable?
    
    Why two fields here? just one `minRememberDurationMin` (or `Sec`) is needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92292941
  
    Based on the comment of @jerryshao I prepared another commit, moving the ``minRememberDurationMin`` from the companion object to the class. @srowen does it look better now? Anything else to be done?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-94934234
  
    Oh, darn it. I merged this since I though this one had passed tests, but on looking back at this one, it has not yet. I will keep a close eye on this and make sure it actually does, and if not, will revert it right away.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28053943
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -331,11 +331,13 @@ private[streaming]
     object FileInputDStream {
     
       /**
    -   * Minimum duration of remembering the information of selected files. Files with mod times
    -   * older than this "window" of remembering will be ignored. So if new files are visible
    -   * within this window, then the file will get selected in the next batch.
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
        */
    -  private val MIN_REMEMBER_DURATION = Minutes(1)
    +  private val minRememberDuration = new SparkConf().get("spark.streaming.minRememberDuration", "1")
    --- End diff --
    
    @srowen changing the property name to ``spark.streaming.minRememberDurationMin`` makes sense with regard to consistency in naming. I will also get rid of of two fields. I'll open another pull request that reflects those changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28055053
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -331,19 +331,22 @@ private[streaming]
     object FileInputDStream {
     
       /**
    -   * Minimum duration of remembering the information of selected files. Files with mod times
    -   * older than this "window" of remembering will be ignored. So if new files are visible
    -   * within this window, then the file will get selected in the next batch.
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
        */
    -  private val MIN_REMEMBER_DURATION = Minutes(1)
    +  private val minRememberDurationMin = Minutes(new SparkConf()
    --- End diff --
    
    This much looks better, though now I also see that we need a real `SparkConf` here somehow. I don't think this is a global, static property of the object anymore. I think this would have to live in the `FileInputDStream` class now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92298676
  
    Based on the comment of @jerryshao , the code is now using ``ssc.conf`` rather than ``ssc.sparkContext.getConf``, and also ``getLong`` method directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-94935289
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on the pull request:

    https://github.com/apache/spark/pull/5438#issuecomment-92342120
  
    @srowen I've switched back to
    
    * ``spark.streaming.minRememberDuration`` as the property name (with a default value of 60 seconds)
    * ``minRememberDuration``  as the corresponding variable name
    
    if these final changes are OK, then I'll be waiting for [SPARK-5931](https://issues.apache.org/jira/browse/SPARK-5931) to be resolved.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3276 Added a new configuration spark.str...

Posted by emres <gi...@git.apache.org>.
Github user emres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5438#discussion_r28225630
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala ---
    @@ -331,19 +331,22 @@ private[streaming]
     object FileInputDStream {
     
       /**
    -   * Minimum duration of remembering the information of selected files. Files with mod times
    -   * older than this "window" of remembering will be ignored. So if new files are visible
    -   * within this window, then the file will get selected in the next batch.
    +   * Minimum duration of remembering the information of selected files. Defaults to 1 minute.
    +   *
    +   * Files with mod times older than this "window" of remembering will be ignored. So if new
    +   * files are visible within this window, then the file will get selected in the next batch.
        */
    -  private val MIN_REMEMBER_DURATION = Minutes(1)
    +  private val minRememberDurationMin = Minutes(new SparkConf()
    --- End diff --
    
    Based on the comment of @jerryshao I prepared another commit, moving the ``minRememberDurationMin`` from the companion object to the class. @srowen does it look better now? Anything else to be done?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org