You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2016/04/05 03:45:24 UTC

[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] deleteLastCheckpoint ...

GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/12166

    [SPARK-13048][ML][MLLIB] deleteLastCheckpoint option for LDA EM optimizer

    ## What changes were proposed in this pull request?
    
    The EMLDAOptimizer should generally not delete its last checkpoint since that can cause failures when DistributedLDAModel methods are called (if any partitions need to be recovered from the checkpoint).
    
    This PR adds a "deleteLastCheckpoint" option which defaults to false.  This is a change in behavior from Spark 1.6, in that the last checkpoint will not be removed by default.
    
    This involves adding the deleteLastCheckpoint option to both spark.ml and spark.mllib, and modifying PeriodicCheckpointer to support the option.
    
    This also:
    * Makes MLlibTestSparkContext extend TempDirectory and set the checkpointDir to tempDir
    * Updates LibSVMRelationSuite because of a name conflict with "tempDir" (and fixes a bug where it failed to delete a temp directory)
    
    ## How was this patch tested?
    
    Added 2 new unit tests to spark.ml LDASuite, which calls into spark.mllib.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark emlda-save-checkpoint

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12166
    
----
commit 5d1c89d1f1732266cbcc00f709a81fc06917ae73
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2016-04-05T01:34:35Z

    Added deleteLastCheckpoint option to LDA, defaulting to false

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-206087743
  
    **[Test build #2757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2757/consoleFull)** for PR 12166 at commit [`7e4e96e`](https://github.com/apache/spark/commit/7e4e96eafbb1c963105bf059ea7767f970d6b91f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait MLlibTestSparkContext extends TempDirectory `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-206050630
  
    **[Test build #2757 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2757/consoleFull)** for PR 12166 at commit [`7e4e96e`](https://github.com/apache/spark/commit/7e4e96eafbb1c963105bf059ea7767f970d6b91f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205621463
  
    **[Test build #54939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54939/consoleFull)** for PR 12166 at commit [`4f4b12e`](https://github.com/apache/spark/commit/4f4b12ee4afe182ba49d6d98690671f6a6c2caab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205620748
  
    OK should be good to go now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12166#discussion_r58485037
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -758,6 +816,10 @@ class LDA @Since("1.6.0") (
       @Since("1.6.0")
       def setOptimizeDocConcentration(value: Boolean): this.type = set(optimizeDocConcentration, value)
     
    +  /** @group expertSetParam */
    +  @Since("2.0.0")
    +  def setKeepLastCheckpoint(value: Boolean): this.type = set(keepLastCheckpoint, value)
    +
    --- End diff --
    
    Do we need to keep the setter in model?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205974236
  
    **[Test build #55013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55013/consoleFull)** for PR 12166 at commit [`7e4e96e`](https://github.com/apache/spark/commit/7e4e96eafbb1c963105bf059ea7767f970d6b91f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] deleteLastCheckpoint ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205594730
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54921/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-207046111
  
    Fixed merge conflicts in MimaExcludes.scala by rebasing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205978307
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-207186805
  
    Merging with master
    @holdenk @hhbyyh Thanks for taking a look!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12166#discussion_r58582132
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -758,6 +816,10 @@ class LDA @Since("1.6.0") (
       @Since("1.6.0")
       def setOptimizeDocConcentration(value: Boolean): this.type = set(optimizeDocConcentration, value)
     
    +  /** @group expertSetParam */
    +  @Since("2.0.0")
    +  def setKeepLastCheckpoint(value: Boolean): this.type = set(keepLastCheckpoint, value)
    +
    --- End diff --
    
    I don't think so.  Once there is a model, the decision about keeping the last checkpoint has already been made.  Users can manage the checkpoint via the deletion method though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-206006902
  
    This change looks good to me pending tests/MiMa & possibly user verification - we should also either add a follow up JIRA for the Python API or add it in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-207046853
  
    **[Test build #55234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55234/consoleFull)** for PR 12166 at commit [`702acab`](https://github.com/apache/spark/commit/702acabd03b6f20071541563ed89bf52419f4b1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205621012
  
    @hhbyyh @holdenk Would you mind taking a look please?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205954120
  
    **[Test build #55001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55001/consoleFull)** for PR 12166 at commit [`59904c4`](https://github.com/apache/spark/commit/59904c441a57a22465e3a2b338f1867ad97f5bdd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205914706
  
    **[Test build #55001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55001/consoleFull)** for PR 12166 at commit [`59904c4`](https://github.com/apache/spark/commit/59904c441a57a22465e3a2b338f1867ad97f5bdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205954399
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55001/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-207080730
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55234/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205913961
  
    Updated!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-206050511
  
    It looks like the MiMa tests were spurious (Spark core)
    I'll make a follow-up JIRA for Python


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12166#discussion_r58582117
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -619,6 +651,31 @@ class DistributedLDAModel private[ml] (
       @Since("1.6.0")
       lazy val logPrior: Double = oldDistributedModel.logPrior
     
    +  private var _checkpointFiles: Array[String] = oldDistributedModel.checkpointFiles
    +
    +  /**
    +   * If using checkpointing and [[LDA.keepLastCheckpoint]] is set to true, then there may be
    +   * saved checkpoint files.  This method is provided so that users can manage those files.
    +   * Note that removing the checkpoints can cause failures if a partition is lost and is needed
    +   * by certain [[DistributedLDAModel]] methods.
    +   *
    +   * @return  Checkpoint files from training
    +   */
    +  @Since("2.0.0")
    +  def getCheckpointFiles: Array[String] = _checkpointFiles
    --- End diff --
    
    I'd like to give users a way to clean up manually, but I'll mark it as DeveloperApi.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12166#discussion_r58483636
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -619,6 +651,31 @@ class DistributedLDAModel private[ml] (
       @Since("1.6.0")
       lazy val logPrior: Double = oldDistributedModel.logPrior
     
    +  private var _checkpointFiles: Array[String] = oldDistributedModel.checkpointFiles
    +
    +  /**
    +   * If using checkpointing and [[LDA.keepLastCheckpoint]] is set to true, then there may be
    +   * saved checkpoint files.  This method is provided so that users can manage those files.
    +   * Note that removing the checkpoints can cause failures if a partition is lost and is needed
    +   * by certain [[DistributedLDAModel]] methods.
    +   *
    +   * @return  Checkpoint files from training
    +   */
    +  @Since("2.0.0")
    +  def getCheckpointFiles: Array[String] = _checkpointFiles
    --- End diff --
    
    Do we really want to expose this? It seems somewhat closely tied to the implementation - maybe we should mark this as a developer API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] deleteLastCheckpoint ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205620168
  
    Hold on...I'm also switching from deleteLastCheckpoint to keepLastCheckpoint


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12166#discussion_r58582125
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -258,7 +265,30 @@ private[clustering] trait LDAParams extends Params with HasFeaturesCol with HasM
       def getOptimizeDocConcentration: Boolean = $(optimizeDocConcentration)
     
       /**
    +   * For EM optimizer, if using checkpointing, this indicates whether to keep the last
    +   * checkpoint. If false, then the checkpoint will be deleted. Deleting the checkpoint can
    +   * cause failures if a data partition is lost, so set this bit with care.
    +   *
    +   * See [[DistributedLDAModel.getCheckpointFiles]] for getting remaining checkpoints and
    +   * [[DistributedLDAModel.deleteCheckpointFiles]] for removing remaining checkpoints.
    +   *
    --- End diff --
    
    Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] deleteLastCheckpoint ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205594723
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-206508811
  
    **[Test build #2760 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2760/consoleFull)** for PR 12166 at commit [`7e4e96e`](https://github.com/apache/spark/commit/7e4e96eafbb1c963105bf059ea7767f970d6b91f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205643719
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205978310
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55013/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12166


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-207080356
  
    **[Test build #55234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55234/consoleFull)** for PR 12166 at commit [`702acab`](https://github.com/apache/spark/commit/702acabd03b6f20071541563ed89bf52419f4b1f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait MLlibTestSparkContext extends TempDirectory `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-207080727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] deleteLastCheckpoint ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205588073
  
    **[Test build #54921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54921/consoleFull)** for PR 12166 at commit [`7c4e11b`](https://github.com/apache/spark/commit/7c4e11be8d7ec561175958b8838ba7de1964fb2d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-206564100
  
    **[Test build #2760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2760/consoleFull)** for PR 12166 at commit [`7e4e96e`](https://github.com/apache/spark/commit/7e4e96eafbb1c963105bf059ea7767f970d6b91f).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait MLlibTestSparkContext extends TempDirectory `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205978280
  
    **[Test build #55013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55013/consoleFull)** for PR 12166 at commit [`7e4e96e`](https://github.com/apache/spark/commit/7e4e96eafbb1c963105bf059ea7767f970d6b91f).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait MLlibTestSparkContext extends TempDirectory `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205643607
  
    **[Test build #54939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54939/consoleFull)** for PR 12166 at commit [`4f4b12e`](https://github.com/apache/spark/commit/4f4b12ee4afe182ba49d6d98690671f6a6c2caab).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205643720
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54939/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205954395
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] deleteLastCheckpoint ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12166#issuecomment-205594585
  
    **[Test build #54921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54921/consoleFull)** for PR 12166 at commit [`7c4e11b`](https://github.com/apache/spark/commit/7c4e11be8d7ec561175958b8838ba7de1964fb2d).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12166#discussion_r58483718
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -258,7 +265,30 @@ private[clustering] trait LDAParams extends Params with HasFeaturesCol with HasM
       def getOptimizeDocConcentration: Boolean = $(optimizeDocConcentration)
     
       /**
    +   * For EM optimizer, if using checkpointing, this indicates whether to keep the last
    +   * checkpoint. If false, then the checkpoint will be deleted. Deleting the checkpoint can
    +   * cause failures if a data partition is lost, so set this bit with care.
    +   *
    +   * See [[DistributedLDAModel.getCheckpointFiles]] for getting remaining checkpoints and
    +   * [[DistributedLDAModel.deleteCheckpointFiles]] for removing remaining checkpoints.
    +   *
    --- End diff --
    
    Maybe add a note that by default reference tracking will eventually clean up the checkpoint regardless so people don't set this to false unnecessarily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org