You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2021/02/28 23:02:05 UTC
[spark] branch master updated: [SPARK-34415][ML] Python example
This is an automated email from the ASF dual-hosted git repository.
srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 5a48eb8 [SPARK-34415][ML] Python example
5a48eb8 is described below
commit 5a48eb8d00faee3a7c8f023c0699296e22edb893
Author: Phillip Henry <Ph...@gmail.com>
AuthorDate: Sun Feb 28 17:01:13 2021 -0600
[SPARK-34415][ML] Python example
Missing Python example file for [SPARK-34415][ML] Randomization in hyperparameter optimization
(https://github.com/apache/spark/pull/31535)
### What changes were proposed in this pull request?
For some reason (probably me being silly) a examples/src/main/python/ml/model_selection_random_hyperparameters_example.py was not pushed in a previous PR.
This PR restores that file.
### Why are the changes needed?
A single file (examples/src/main/python/ml/model_selection_random_hyperparameters_example.py) that should have been pushed as part of SPARK-34415 but was not. This was causing Lint errors as highlighted by dongjoon-hyun. Consequently, srowen asked for a new PR.
### Does this PR introduce _any_ user-facing change?
No, it merely restores a file that was overlook in SPARK-34415.
### How was this patch tested?
By running:
`bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py`
Closes #31687 from PhillHenry/SPARK-34415_model_selection_random_hyperparameters_example.
Authored-by: Phillip Henry <Ph...@gmail.com>
Signed-off-by: Sean Owen <sr...@gmail.com>
---
...del_selection_random_hyperparameters_example.py | 66 ++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py b/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
new file mode 100644
index 0000000..b436341
--- /dev/null
+++ b/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
@@ -0,0 +1,66 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+This example uses random hyperparameters to perform model selection.
+Run with:
+
+ bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
+"""
+# $example on$
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamRandomBuilder, CrossValidator
+# $example off$
+from pyspark.sql import SparkSession
+
+if __name__ == "__main__":
+ spark = SparkSession \
+ .builder \
+ .appName("TrainValidationSplit") \
+ .getOrCreate()
+
+ # $example on$
+ data = spark.read.format("libsvm") \
+ .load("data/mllib/sample_linear_regression_data.txt")
+
+ lr = LinearRegression(maxIter=10)
+
+ # We sample the regularization parameter logarithmically over the range [0.01, 1.0].
+ # This means that values around 0.01, 0.1 and 1.0 are roughly equally likely.
+ # Note that both parameters must be greater than zero as otherwise we'll get an infinity.
+ # We sample the the ElasticNet mixing parameter uniformly over the range [0, 1]
+ # Note that in real life, you'd choose more than the 5 samples we see below.
+ hyperparameters = ParamRandomBuilder() \
+ .addLog10Random(lr.regParam, 0.01, 1.0, 5) \
+ .addRandom(lr.elasticNetParam, 0.0, 1.0, 5) \
+ .addGrid(lr.fitIntercept, [False, True]) \
+ .build()
+
+ cv = CrossValidator(estimator=lr,
+ estimatorParamMaps=hyperparameters,
+ evaluator=RegressionEvaluator(),
+ numFolds=2)
+
+ model = cv.fit(data)
+ bestModel = model.bestModel
+ print("Optimal model has regParam = {}, elasticNetParam = {}, fitIntercept = {}"
+ .format(bestModel.getRegParam(), bestModel.getElasticNetParam(),
+ bestModel.getFitIntercept()))
+
+ # $example off$
+ spark.stop()
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org