You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/18 00:13:46 UTC

[GitHub] [spark] huaxingao commented on a change in pull request #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper

huaxingao commented on a change in pull request #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
URL: https://github.com/apache/spark/pull/27593#discussion_r380403796
 
 

 ##########
 File path: R/pkg/R/mllib_regression.R
 ##########
 @@ -540,3 +546,147 @@ setMethod("write.ml", signature(object = "AFTSurvivalRegressionModel", path = "c
           function(object, path, overwrite = FALSE) {
             write_internal(object, path, overwrite)
           })
+
+#' Linear Regression Model
+#'
+#' \code{spark.lm} fits a linear regression model against a SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently only a few formula
+#'                operators are supported, including '~', '.', ':', '+', and '-'.
+#' @param maxIter maximum iteration number.
+#' @param regParam the regularization parameter.
+#' @param elasticNetParam the ElasticNet mixing parameter, in range [0, 1].
+#'        For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.
+#' @param tol convergence tolerance of iterations.
+#' @param standardization whether to standardize the training features before fitting the model.
+#' @param weightCol weight column name.
+#' @param aggregationDepth suggested depth for treeAggregate (>= 2).
+#' @param loss the loss function to be optimized. Supported options: "squaredError" and "huber"
+#' @param epsilon the shape parameter to control the amount of robustnes
+#' @param solver The solver algorithm for optimization.
+#'        Supported options: "l-bfgs", "normal" and "auto".
+#' @param stringIndexerOrderType how to order categories of a string feature column. This is used to
+#'                               decide the base level of a string feature as the last category
+#'                               after ordering is dropped when encoding strings. Supported options
+#'                               are "frequencyDesc", "frequencyAsc", "alphabetDesc", and
+#'                               "alphabetAsc". The default value is "frequencyDesc". When the
+#'                               ordering is set to "alphabetDesc", this drops the same category
+#'                               as R when encoding strings.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.lm} returns a fitted Linear Regression Model.
+#'
+#' @rdname spark.lm
+#' @aliases spark.lm,SparkDataFrame,formula-method
+#' @name spark.lm
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_linear_regression_data.txt", source = "libsvm")
+#'
+#' # fit Linear Regression Model
+#' model <- spark.lm(
+#'            df, label ~ features,
+#'            regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'          )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.lm since 3.1.0
+setMethod("spark.lm", signature(data = "SparkDataFrame", formula = "formula"),
+          function(data, formula,
+                   maxIter = 100L, regParam = 0.0, elasticNetParam = 0.0,
+                   tol = 1e-6, standardization = TRUE,
+                   solver = c("auto", "l-bfgs", "normal"),
+                   weightCol = NULL, aggregationDepth = 2L,
+                   loss = c("squaredError", "huber"), epsilon = 1.35,
+                   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+                                              "alphabetDesc", "alphabetAsc")) {
+
+
 
 Review comment:
   nit: delete extra blank line? There are several other places that have two blank lines as well. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org