You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2021/02/28 23:02:05 UTC

[spark] branch master updated: [SPARK-34415][ML] Python example

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 5a48eb8  [SPARK-34415][ML] Python example
5a48eb8 is described below

commit 5a48eb8d00faee3a7c8f023c0699296e22edb893
Author: Phillip Henry <Ph...@gmail.com>
AuthorDate: Sun Feb 28 17:01:13 2021 -0600

    [SPARK-34415][ML] Python example
    
    Missing Python example file for [SPARK-34415][ML] Randomization in hyperparameter optimization
     (https://github.com/apache/spark/pull/31535)
    
    ### What changes were proposed in this pull request?
    For some reason (probably me being silly) a examples/src/main/python/ml/model_selection_random_hyperparameters_example.py was not pushed in a previous PR.
    This PR restores that file.
    
    ### Why are the changes needed?
    A single file (examples/src/main/python/ml/model_selection_random_hyperparameters_example.py) that should have been pushed as part of SPARK-34415 but was not. This was causing Lint errors as highlighted by dongjoon-hyun. Consequently, srowen asked for a new PR.
    
    ### Does this PR introduce _any_ user-facing change?
    No, it merely restores a file that was overlook in SPARK-34415.
    
    ### How was this patch tested?
    By running:
    `bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py`
    
    Closes #31687 from PhillHenry/SPARK-34415_model_selection_random_hyperparameters_example.
    
    Authored-by: Phillip Henry <Ph...@gmail.com>
    Signed-off-by: Sean Owen <sr...@gmail.com>
---
 ...del_selection_random_hyperparameters_example.py | 66 ++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py b/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
new file mode 100644
index 0000000..b436341
--- /dev/null
+++ b/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
@@ -0,0 +1,66 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+This example uses random hyperparameters to perform model selection.
+Run with:
+
+  bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
+"""
+# $example on$
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamRandomBuilder, CrossValidator
+# $example off$
+from pyspark.sql import SparkSession
+
+if __name__ == "__main__":
+    spark = SparkSession \
+        .builder \
+        .appName("TrainValidationSplit") \
+        .getOrCreate()
+
+    # $example on$
+    data = spark.read.format("libsvm") \
+        .load("data/mllib/sample_linear_regression_data.txt")
+
+    lr = LinearRegression(maxIter=10)
+
+    # We sample the regularization parameter logarithmically over the range [0.01, 1.0].
+    # This means that values around 0.01, 0.1 and 1.0 are roughly equally likely.
+    # Note that both parameters must be greater than zero as otherwise we'll get an infinity.
+    # We sample the the ElasticNet mixing parameter uniformly over the range [0, 1]
+    # Note that in real life, you'd choose more than the 5 samples we see below.
+    hyperparameters = ParamRandomBuilder() \
+        .addLog10Random(lr.regParam, 0.01, 1.0, 5) \
+        .addRandom(lr.elasticNetParam, 0.0, 1.0, 5) \
+        .addGrid(lr.fitIntercept, [False, True]) \
+        .build()
+
+    cv = CrossValidator(estimator=lr,
+                        estimatorParamMaps=hyperparameters,
+                        evaluator=RegressionEvaluator(),
+                        numFolds=2)
+
+    model = cv.fit(data)
+    bestModel = model.bestModel
+    print("Optimal model has regParam = {}, elasticNetParam = {}, fitIntercept = {}"
+          .format(bestModel.getRegParam(), bestModel.getElasticNetParam(),
+                  bestModel.getFitIntercept()))
+
+    # $example off$
+    spark.stop()


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org