You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by fe...@apache.org on 2018/01/10 07:32:56 UTC
spark git commit: [SPARK-22993][ML] Clarify HasCheckpointInterval
param doc
Repository: spark
Updated Branches:
refs/heads/master eaac60a1e -> 70bcc9d5a
[SPARK-22993][ML] Clarify HasCheckpointInterval param doc
## What changes were proposed in this pull request?
Add a note to the `HasCheckpointInterval` parameter doc that clarifies that this setting is ignored when no checkpoint directory has been set on the spark context.
## How was this patch tested?
No tests necessary, just a doc update.
Author: sethah <sh...@cloudera.com>
Closes #20188 from sethah/als_checkpoint_doc.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/70bcc9d5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/70bcc9d5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/70bcc9d5
Branch: refs/heads/master
Commit: 70bcc9d5ae33d6669bb5c97db29087ccead770fb
Parents: eaac60a
Author: sethah <sh...@cloudera.com>
Authored: Tue Jan 9 23:32:47 2018 -0800
Committer: Felix Cheung <fe...@apache.org>
Committed: Tue Jan 9 23:32:47 2018 -0800
----------------------------------------------------------------------
R/pkg/R/mllib_recommendation.R | 2 ++
R/pkg/R/mllib_tree.R | 6 ++++++
.../org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala | 4 +++-
.../scala/org/apache/spark/ml/param/shared/sharedParams.scala | 4 ++--
python/pyspark/ml/param/_shared_params_code_gen.py | 5 +++--
python/pyspark/ml/param/shared.py | 4 ++--
6 files changed, 18 insertions(+), 7 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/70bcc9d5/R/pkg/R/mllib_recommendation.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/mllib_recommendation.R b/R/pkg/R/mllib_recommendation.R
index fa79424..5441c4a 100644
--- a/R/pkg/R/mllib_recommendation.R
+++ b/R/pkg/R/mllib_recommendation.R
@@ -48,6 +48,8 @@ setClass("ALSModel", representation(jobj = "jobj"))
#' @param numUserBlocks number of user blocks used to parallelize computation (> 0).
#' @param numItemBlocks number of item blocks used to parallelize computation (> 0).
#' @param checkpointInterval number of checkpoint intervals (>= 1) or disable checkpoint (-1).
+#' Note: this setting will be ignored if the checkpoint directory is not
+#' set.
#' @param ... additional argument(s) passed to the method.
#' @return \code{spark.als} returns a fitted ALS model.
#' @rdname spark.als
http://git-wip-us.apache.org/repos/asf/spark/blob/70bcc9d5/R/pkg/R/mllib_tree.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/mllib_tree.R b/R/pkg/R/mllib_tree.R
index 89a58bf..4e5ddf2 100644
--- a/R/pkg/R/mllib_tree.R
+++ b/R/pkg/R/mllib_tree.R
@@ -161,6 +161,8 @@ print.summary.decisionTree <- function(x) {
#' >= 1.
#' @param minInfoGain Minimum information gain for a split to be considered at a tree node.
#' @param checkpointInterval Param for set checkpoint interval (>= 1) or disable checkpoint (-1).
+#' Note: this setting will be ignored if the checkpoint directory is not
+#' set.
#' @param maxMemoryInMB Maximum memory in MB allocated to histogram aggregation.
#' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to match instances with
#' nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching
@@ -382,6 +384,8 @@ setMethod("write.ml", signature(object = "GBTClassificationModel", path = "chara
#' @param minInstancesPerNode Minimum number of instances each child must have after split.
#' @param minInfoGain Minimum information gain for a split to be considered at a tree node.
#' @param checkpointInterval Param for set checkpoint interval (>= 1) or disable checkpoint (-1).
+#' Note: this setting will be ignored if the checkpoint directory is not
+#' set.
#' @param maxMemoryInMB Maximum memory in MB allocated to histogram aggregation.
#' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to match instances with
#' nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching
@@ -595,6 +599,8 @@ setMethod("write.ml", signature(object = "RandomForestClassificationModel", path
#' @param minInstancesPerNode Minimum number of instances each child must have after split.
#' @param minInfoGain Minimum information gain for a split to be considered at a tree node.
#' @param checkpointInterval Param for set checkpoint interval (>= 1) or disable checkpoint (-1).
+#' Note: this setting will be ignored if the checkpoint directory is not
+#' set.
#' @param maxMemoryInMB Maximum memory in MB allocated to histogram aggregation.
#' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to match instances with
#' nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching
http://git-wip-us.apache.org/repos/asf/spark/blob/70bcc9d5/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
index a5d57a1..6ad44af 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
@@ -63,7 +63,9 @@ private[shared] object SharedParamsCodeGen {
ParamDesc[Array[String]]("outputCols", "output column names"),
ParamDesc[Int]("checkpointInterval", "set checkpoint interval (>= 1) or " +
"disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed " +
- "every 10 iterations", isValid = "(interval: Int) => interval == -1 || interval >= 1"),
+ "every 10 iterations. Note: this setting will be ignored if the checkpoint directory " +
+ "is not set in the SparkContext",
+ isValid = "(interval: Int) => interval == -1 || interval >= 1"),
ParamDesc[Boolean]("fitIntercept", "whether to fit an intercept term", Some("true")),
ParamDesc[String]("handleInvalid", "how to handle invalid entries. Options are skip (which " +
"will filter out rows with bad values), or error (which will throw an error). More " +
http://git-wip-us.apache.org/repos/asf/spark/blob/70bcc9d5/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
index 13425da..be8b2f2 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
@@ -282,10 +282,10 @@ trait HasOutputCols extends Params {
trait HasCheckpointInterval extends Params {
/**
- * Param for set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations.
+ * Param for set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext.
* @group param
*/
- final val checkpointInterval: IntParam = new IntParam(this, "checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations", (interval: Int) => interval == -1 || interval >= 1)
+ final val checkpointInterval: IntParam = new IntParam(this, "checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext", (interval: Int) => interval == -1 || interval >= 1)
/** @group getParam */
final def getCheckpointInterval: Int = $(checkpointInterval)
http://git-wip-us.apache.org/repos/asf/spark/blob/70bcc9d5/python/pyspark/ml/param/_shared_params_code_gen.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/param/_shared_params_code_gen.py b/python/pyspark/ml/param/_shared_params_code_gen.py
index d55d209..1d0f60a 100644
--- a/python/pyspark/ml/param/_shared_params_code_gen.py
+++ b/python/pyspark/ml/param/_shared_params_code_gen.py
@@ -121,8 +121,9 @@ if __name__ == "__main__":
("outputCol", "output column name.", "self.uid + '__output'", "TypeConverters.toString"),
("numFeatures", "number of features.", None, "TypeConverters.toInt"),
("checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint (-1). " +
- "E.g. 10 means that the cache will get checkpointed every 10 iterations.", None,
- "TypeConverters.toInt"),
+ "E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: " +
+ "this setting will be ignored if the checkpoint directory is not set in the SparkContext.",
+ None, "TypeConverters.toInt"),
("seed", "random seed.", "hash(type(self).__name__)", "TypeConverters.toInt"),
("tol", "the convergence tolerance for iterative algorithms (>= 0).", None,
"TypeConverters.toFloat"),
http://git-wip-us.apache.org/repos/asf/spark/blob/70bcc9d5/python/pyspark/ml/param/shared.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/param/shared.py b/python/pyspark/ml/param/shared.py
index e5c5ddf..813f7a5 100644
--- a/python/pyspark/ml/param/shared.py
+++ b/python/pyspark/ml/param/shared.py
@@ -281,10 +281,10 @@ class HasNumFeatures(Params):
class HasCheckpointInterval(Params):
"""
- Mixin for param checkpointInterval: set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations.
+ Mixin for param checkpointInterval: set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext.
"""
- checkpointInterval = Param(Params._dummy(), "checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations.", typeConverter=TypeConverters.toInt)
+ checkpointInterval = Param(Params._dummy(), "checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext.", typeConverter=TypeConverters.toInt)
def __init__(self):
super(HasCheckpointInterval, self).__init__()
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org