You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2017/09/08 07:08:14 UTC

spark git commit: [SPARK-21915][ML][PYSPARK] Model 1 and Model 2 ParamMaps Missing

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 4304d0bf0 -> 781a1f83c


[SPARK-21915][ML][PYSPARK] Model 1 and Model 2 ParamMaps Missing

dongjoon-hyun HyukjinKwon

Error in PySpark example code:
/examples/src/main/python/ml/estimator_transformer_param_example.py

The original Scala code says
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

This code has been tested in Python, and returns values consistent with Scala

## What changes were proposed in this pull request?

Proposing to call the lr variable instead of model1 or model2

## How was this patch tested?

This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. Pyspark returns nothing at present for those two print lines.

The output for model2 in PySpark should be

{Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the convergence tolerance for iterative algorithms (>= 0).'): 1e-06,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', doc='prediction column name.'): 'prediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', doc='features column name.'): 'features',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', doc='label column name.'): 'label',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', doc='Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.'): 'myProbability',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name.'): 'rawPrediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The name of family which is a description of the label distribution to be used in the model. Supported options: auto, binomial, multinomial'): 'auto',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', doc='whether to fit an intercept term.'): True,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', doc='Threshold in binary classification prediction, in range [0, 1]. If threshold and thresholds are both set, they must match.e.g. if threshold is p, then thresholds must be equal to [1-p, p].'): 0.55,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', doc='max number of iterations (>= 0).'): 30,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', doc='regularization parameter (>= 0).'): 0.1,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', doc='whether to standardize the training features before fitting the model.'): True}

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: MarkTab marktab.net <ma...@users.noreply.github.com>

Closes #19152 from marktab/branch-2.2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/781a1f83
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/781a1f83
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/781a1f83

Branch: refs/heads/branch-2.2
Commit: 781a1f83c538a80ce1f1876e4786b02cb7984e16
Parents: 4304d0b
Author: MarkTab marktab.net <ma...@users.noreply.github.com>
Authored: Fri Sep 8 08:08:09 2017 +0100
Committer: Sean Owen <so...@cloudera.com>
Committed: Fri Sep 8 08:08:09 2017 +0100

----------------------------------------------------------------------
 .../src/main/python/ml/estimator_transformer_param_example.py    | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/781a1f83/examples/src/main/python/ml/estimator_transformer_param_example.py
----------------------------------------------------------------------
diff --git a/examples/src/main/python/ml/estimator_transformer_param_example.py b/examples/src/main/python/ml/estimator_transformer_param_example.py
index eb21051..929bd25 100644
--- a/examples/src/main/python/ml/estimator_transformer_param_example.py
+++ b/examples/src/main/python/ml/estimator_transformer_param_example.py
@@ -53,7 +53,7 @@ if __name__ == "__main__":
     # This prints the parameter (name: value) pairs, where names are unique IDs for this
     # LogisticRegression instance.
     print("Model 1 was fit using parameters: ")
-    print(model1.extractParamMap())
+    print(lr.extractParamMap())
 
     # We may alternatively specify parameters using a Python dictionary as a paramMap
     paramMap = {lr.maxIter: 20}
@@ -69,7 +69,7 @@ if __name__ == "__main__":
     # paramMapCombined overrides all parameters set earlier via lr.set* methods.
     model2 = lr.fit(training, paramMapCombined)
     print("Model 2 was fit using parameters: ")
-    print(model2.extractParamMap())
+    print(lr.extractParamMap(extra=paramMapCombined))
 
     # Prepare test data
     test = spark.createDataFrame([


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org