You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2021/08/24 20:39:11 UTC

[spark] branch master updated: Revert "[SPARK-34415][ML] Randomization in hyperparameter optimization"

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new de932f5  Revert "[SPARK-34415][ML] Randomization in hyperparameter optimization"
de932f5 is described below

commit de932f51ceb8b9805c26c7bd13c1cfb628d8128d
Author: Gengliang Wang <ge...@apache.org>
AuthorDate: Tue Aug 24 13:38:14 2021 -0700

    Revert "[SPARK-34415][ML] Randomization in hyperparameter optimization"
    
    ### What changes were proposed in this pull request?
    
    Revert https://github.com/apache/spark/commit/397b843890db974a0534394b1907d33d62c2b888 and https://github.com/apache/spark/commit/5a48eb8d00faee3a7c8f023c0699296e22edb893
    
    ### Why are the changes needed?
    
    As discussed in https://github.com/apache/spark/pull/33800#issuecomment-904140869, there is correctness issue in the current implementation. Let's revert the code changes from branch 3.2 and fix it on master branch later
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Ci tests
    
    Closes #33819 from gengliangwang/revert-SPARK-34415.
    
    Authored-by: Gengliang Wang <ge...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 docs/ml-tuning.md                                  |  36 +----
 ...elSelectionViaRandomHyperparametersExample.java |  83 ----------
 ...del_selection_random_hyperparameters_example.py |  66 --------
 ...lSelectionViaRandomHyperparametersExample.scala |  79 ----------
 .../spark/ml/tuning/ParamRandomBuilder.scala       | 160 --------------------
 .../spark/ml/tuning/ParamRandomBuilderSuite.scala  | 123 ---------------
 .../apache/spark/ml/tuning/RandomRangesSuite.scala | 168 ---------------------
 python/docs/source/reference/pyspark.ml.rst        |   1 -
 python/pyspark/ml/tests/test_tuning.py             | 105 +------------
 python/pyspark/ml/tuning.py                        |  48 +-----
 python/pyspark/ml/tuning.pyi                       |   5 -
 11 files changed, 3 insertions(+), 871 deletions(-)

diff --git a/docs/ml-tuning.md b/docs/ml-tuning.md
index e7940a3..3ddd185 100644
--- a/docs/ml-tuning.md
+++ b/docs/ml-tuning.md
@@ -71,44 +71,10 @@ for multiclass problems, a [`MultilabelClassificationEvaluator`](api/scala/org/a
 [`RankingEvaluator`](api/scala/org/apache/spark/ml/evaluation/RankingEvaluator.html) for ranking problems. The default metric used to
 choose the best `ParamMap` can be overridden by the `setMetricName` method in each of these evaluators.
 
-To help construct the parameter grid, users can use the [`ParamGridBuilder`](api/scala/org/apache/spark/ml/tuning/ParamGridBuilder.html) utility (see the *Cross-Validation* section below for an example).
+To help construct the parameter grid, users can use the [`ParamGridBuilder`](api/scala/org/apache/spark/ml/tuning/ParamGridBuilder.html) utility.
 By default, sets of parameters from the parameter grid are evaluated in serial. Parameter evaluation can be done in parallel by setting `parallelism` with a value of 2 or more (a value of 1 will be serial) before running model selection with `CrossValidator` or `TrainValidationSplit`.
 The value of `parallelism` should be chosen carefully to maximize parallelism without exceeding cluster resources, and larger values may not always lead to improved performance.  Generally speaking, a value up to 10 should be sufficient for most clusters.
 
-Alternatively, users can use the [`ParamRandomBuilder`](api/scala/org/apache/spark/ml/tuning/ParamRandomBuilder.html) utility.
-This has the same properties of `ParamGridBuilder` mentioned above, but hyperparameters are chosen at random within a user-defined range.
-The mathematical principle behind this is that given enough samples, the probability of at least one sample *not* being near the optimum within a range tends to zero.
-Irrespective of machine learning model, the expected number of samples needed to have at least one within 5% of the optimum is about 60. 
-If this 5% volume lies between the parameters defined in a grid search, it will *never* be found by `ParamGridBuilder`.  
-
-<div class="codetabs">
-
-<div data-lang="scala" markdown="1">
-
-Refer to the [`ParamRandomBuilder` Scala docs](api/scala/org/apache/spark/ml/tuning/ParamRandomBuilder.html) for details on the API.
-
-{% include_example scala/org/apache/spark/examples/ml/ModelSelectionViaRandomHyperparametersExample.scala %}
-</div>
-
-<div data-lang="java" markdown="1">
-
-Refer to the [`ParamRandomBuilder` Java docs](api/java/org/apache/spark/ml/tuning/ParamRandomBuilder.html) for details on the API.
-
-{% include_example java/org/apache/spark/examples/ml/JavaModelSelectionViaRandomHyperparametersExample.java %}
-</div>
-
-<div data-lang="python" markdown="1">
-
-Python users are recommended to look at Python libraries that are specifically for hyperparameter tuning such as Hyperopt.  
-
-Refer to the [`ParamRandomBuilder` Java docs](api/python/reference/api/pyspark.ml.tuning.ParamRandomBuilder.html) for details on the API.
-
-{% include_example python/ml/model_selection_random_hyperparameters_example.py %}
-
-</div>
-
-</div>
-
 # Cross-Validation
 
 `CrossValidator` begins by splitting the dataset into a set of *folds* which are used as separate training and test datasets. E.g., with `$k=3$` folds, `CrossValidator` will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing.  To evaluate a particular `ParamMap`, `CrossValidator` computes the average evaluation metric for the 3 `Model`s produced by fitting the `Estimator` on the 3 different (training, test) dataset pairs.
diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaModelSelectionViaRandomHyperparametersExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaModelSelectionViaRandomHyperparametersExample.java
deleted file mode 100644
index 086920f..0000000
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaModelSelectionViaRandomHyperparametersExample.java
+++ /dev/null
@@ -1,83 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.examples.ml;
-
-// $example on$
-import org.apache.spark.ml.evaluation.RegressionEvaluator;
-import org.apache.spark.ml.param.ParamMap;
-import org.apache.spark.ml.regression.LinearRegression;
-import org.apache.spark.ml.tuning.*;
-import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.SparkSession;
-// $example off$
-
-/**
- * A simple example demonstrating model selection using ParamRandomBuilder.
- *
- * Run with
- * {{{
- * bin/run-example ml.JavaModelSelectionViaRandomHyperparametersExample
- * }}}
- */
-public class JavaModelSelectionViaRandomHyperparametersExample {
-
-    public static void main(String[] args) {
-        SparkSession spark = SparkSession
-                .builder()
-                .appName("JavaModelSelectionViaTrainValidationSplitExample")
-                .getOrCreate();
-
-        // $example on$
-        Dataset<Row> data = spark.read().format("libsvm")
-                .load("data/mllib/sample_linear_regression_data.txt");
-
-        LinearRegression lr = new LinearRegression();
-
-        // We sample the regularization parameter logarithmically over the range [0.01, 1.0].
-        // This means that values around 0.01, 0.1 and 1.0 are roughly equally likely.
-        // Note that both parameters must be greater than zero as otherwise we'll get an infinity.
-        // We sample the the ElasticNet mixing parameter uniformly over the range [0, 1]
-        // Note that in real life, you'd choose more than the 5 samples we see below.
-        ParamMap[] hyperparameters = new ParamRandomBuilder()
-                .addLog10Random(lr.regParam(), 0.01, 1.0, 5)
-                .addRandom(lr.elasticNetParam(), 0.0, 1.0, 5)
-                .addGrid(lr.fitIntercept())
-                .build();
-
-        System.out.println("hyperparameters:");
-        for (ParamMap param : hyperparameters) {
-            System.out.println(param);
-        }
-
-        CrossValidator cv = new CrossValidator()
-                .setEstimator(lr)
-                .setEstimatorParamMaps(hyperparameters)
-                .setEvaluator(new RegressionEvaluator())
-                .setNumFolds(3);
-        CrossValidatorModel cvModel = cv.fit(data);
-        LinearRegression parent = (LinearRegression)cvModel.bestModel().parent();
-
-        System.out.println("Optimal model has\n" + lr.regParam() + " = " + parent.getRegParam()
-                + "\n" + lr.elasticNetParam() + " = "+ parent.getElasticNetParam()
-                + "\n" + lr.fitIntercept() + " = " + parent.getFitIntercept());
-        // $example off$
-
-        spark.stop();
-    }
-}
diff --git a/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py b/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
deleted file mode 100644
index b436341..0000000
--- a/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
+++ /dev/null
@@ -1,66 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""
-This example uses random hyperparameters to perform model selection.
-Run with:
-
-  bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py
-"""
-# $example on$
-from pyspark.ml.evaluation import RegressionEvaluator
-from pyspark.ml.regression import LinearRegression
-from pyspark.ml.tuning import ParamRandomBuilder, CrossValidator
-# $example off$
-from pyspark.sql import SparkSession
-
-if __name__ == "__main__":
-    spark = SparkSession \
-        .builder \
-        .appName("TrainValidationSplit") \
-        .getOrCreate()
-
-    # $example on$
-    data = spark.read.format("libsvm") \
-        .load("data/mllib/sample_linear_regression_data.txt")
-
-    lr = LinearRegression(maxIter=10)
-
-    # We sample the regularization parameter logarithmically over the range [0.01, 1.0].
-    # This means that values around 0.01, 0.1 and 1.0 are roughly equally likely.
-    # Note that both parameters must be greater than zero as otherwise we'll get an infinity.
-    # We sample the the ElasticNet mixing parameter uniformly over the range [0, 1]
-    # Note that in real life, you'd choose more than the 5 samples we see below.
-    hyperparameters = ParamRandomBuilder() \
-        .addLog10Random(lr.regParam, 0.01, 1.0, 5) \
-        .addRandom(lr.elasticNetParam, 0.0, 1.0, 5) \
-        .addGrid(lr.fitIntercept, [False, True]) \
-        .build()
-
-    cv = CrossValidator(estimator=lr,
-                        estimatorParamMaps=hyperparameters,
-                        evaluator=RegressionEvaluator(),
-                        numFolds=2)
-
-    model = cv.fit(data)
-    bestModel = model.bestModel
-    print("Optimal model has regParam = {}, elasticNetParam = {}, fitIntercept = {}"
-          .format(bestModel.getRegParam(), bestModel.getElasticNetParam(),
-                  bestModel.getFitIntercept()))
-
-    # $example off$
-    spark.stop()
diff --git a/examples/src/main/scala/org/apache/spark/examples/ml/ModelSelectionViaRandomHyperparametersExample.scala b/examples/src/main/scala/org/apache/spark/examples/ml/ModelSelectionViaRandomHyperparametersExample.scala
deleted file mode 100644
index 9d2c58bb..0000000
--- a/examples/src/main/scala/org/apache/spark/examples/ml/ModelSelectionViaRandomHyperparametersExample.scala
+++ /dev/null
@@ -1,79 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.examples.ml
-
-// $example on$
-import org.apache.spark.ml.evaluation.RegressionEvaluator
-import org.apache.spark.ml.regression.LinearRegression
-import org.apache.spark.ml.tuning.{CrossValidator, CrossValidatorModel, Limits, ParamRandomBuilder}
-import org.apache.spark.ml.tuning.RandomRanges._
-// $example off$
-import org.apache.spark.sql.SparkSession
-
-/**
- * A simple example demonstrating model selection using ParamRandomBuilder.
- *
- * Run with
- * {{{
- * bin/run-example ml.ModelSelectionViaRandomHyperparametersExample
- * }}}
- */
-object ModelSelectionViaRandomHyperparametersExample {
-  def main(args: Array[String]): Unit = {
-    val spark = SparkSession
-      .builder
-      .appName("ModelSelectionViaTrainValidationSplitExample")
-      .getOrCreate()
-    // scalastyle:off println
-    // $example on$
-    // Prepare training and test data.
-    val data = spark.read.format("libsvm").load("data/mllib/sample_linear_regression_data.txt")
-
-    val lr = new LinearRegression().setMaxIter(10)
-
-    // We sample the regularization parameter logarithmically over the range [0.01, 1.0].
-    // This means that values around 0.01, 0.1 and 1.0 are roughly equally likely.
-    // Note that both parameters must be greater than zero as otherwise we'll get an infinity.
-    // We sample the the ElasticNet mixing parameter uniformly over the range [0, 1]
-    // Note that in real life, you'd choose more than the 5 samples we see below.
-    val hyperparameters = new ParamRandomBuilder()
-      .addLog10Random(lr.regParam, Limits(0.01, 1.0), 5)
-      .addGrid(lr.fitIntercept)
-      .addRandom(lr.elasticNetParam, Limits(0.0, 1.0), 5)
-      .build()
-
-    println(s"hyperparameters:\n${hyperparameters.mkString("\n")}")
-
-    val cv: CrossValidator = new CrossValidator()
-      .setEstimator(lr)
-      .setEstimatorParamMaps(hyperparameters)
-      .setEvaluator(new RegressionEvaluator)
-      .setNumFolds(3)
-    val cvModel: CrossValidatorModel = cv.fit(data)
-    val parent: LinearRegression = cvModel.bestModel.parent.asInstanceOf[LinearRegression]
-
-    println(s"""Optimal model has:
-         |${lr.regParam}        = ${parent.getRegParam}
-         |${lr.elasticNetParam} = ${parent.getElasticNetParam}
-         |${lr.fitIntercept}    = ${parent.getFitIntercept}""".stripMargin)
-    // $example off$
-
-    spark.stop()
-  }
-  // scalastyle:on println
-}
diff --git a/mllib/src/main/scala/org/apache/spark/ml/tuning/ParamRandomBuilder.scala b/mllib/src/main/scala/org/apache/spark/ml/tuning/ParamRandomBuilder.scala
deleted file mode 100644
index 9c296bb..0000000
--- a/mllib/src/main/scala/org/apache/spark/ml/tuning/ParamRandomBuilder.scala
+++ /dev/null
@@ -1,160 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.ml.tuning
-
-import org.apache.spark.annotation.Since
-import org.apache.spark.ml.param._
-import org.apache.spark.ml.tuning.RandomRanges._
-
-case class Limits[T: Numeric](x: T, y: T)
-
-private[ml] abstract class RandomT[T: Numeric] {
-  def randomT(): T
-  def randomTLog(n: Int): T
-}
-
-abstract class Generator[T: Numeric] {
-  def apply(lim: Limits[T]): RandomT[T]
-}
-
-object RandomRanges {
-
-  private val rnd = new scala.util.Random
-
-  private[tuning] def randomBigInt0To(x: BigInt): BigInt = {
-    var randVal = BigInt(x.bitLength, rnd)
-    while (randVal > x) {
-      randVal = BigInt(x.bitLength, rnd)
-    }
-    randVal
-  }
-
-  private[ml] def bigIntBetween(lower: BigInt, upper: BigInt): BigInt = {
-    val diff: BigInt = upper - lower
-    randomBigInt0To(diff) + lower
-  }
-
-  private def randomBigDecimalBetween(lower: BigDecimal, upper: BigDecimal): BigDecimal = {
-    val zeroCenteredRnd: BigDecimal = BigDecimal(rnd.nextDouble() - 0.5)
-    val range: BigDecimal = upper - lower
-    val halfWay: BigDecimal = lower + range / 2
-    (zeroCenteredRnd * range) + halfWay
-  }
-
-  implicit object DoubleGenerator extends Generator[Double] {
-    def apply(limits: Limits[Double]): RandomT[Double] = new RandomT[Double] {
-      import limits._
-      val lower: Double = math.min(x, y)
-      val upper: Double = math.max(x, y)
-
-      override def randomTLog(n: Int): Double =
-        RandomRanges.randomLog(lower, upper, n)
-
-      override def randomT(): Double =
-        randomBigDecimalBetween(BigDecimal(lower), BigDecimal(upper)).doubleValue
-    }
-  }
-
-  implicit object FloatGenerator extends Generator[Float] {
-    def apply(limits: Limits[Float]): RandomT[Float] = new RandomT[Float] {
-      import limits._
-      val lower: Float = math.min(x, y)
-      val upper: Float = math.max(x, y)
-
-      override def randomTLog(n: Int): Float =
-        RandomRanges.randomLog(lower, upper, n).toFloat
-
-      override def randomT(): Float =
-        randomBigDecimalBetween(BigDecimal(lower), BigDecimal(upper)).floatValue
-    }
-  }
-
-  implicit object IntGenerator extends Generator[Int] {
-    def apply(limits: Limits[Int]): RandomT[Int] = new RandomT[Int] {
-      import limits._
-      val lower: Int = math.min(x, y)
-      val upper: Int = math.max(x, y)
-
-      override def randomTLog(n: Int): Int =
-        RandomRanges.randomLog(lower, upper, n).toInt
-
-      override def randomT(): Int =
-        bigIntBetween(BigInt(lower), BigInt(upper)).intValue
-    }
-  }
-
-  private[ml] def logN(x: Double, base: Int): Double = math.log(x) / math.log(base)
-
-  private[ml] def randomLog(lower: Double, upper: Double, n: Int): Double = {
-    val logLower: Double = logN(lower, n)
-    val logUpper: Double = logN(upper, n)
-    val logLimits: Limits[Double] = Limits(logLower, logUpper)
-    val rndLogged: RandomT[Double] = RandomRanges(logLimits)
-    math.pow(n, rndLogged.randomT())
-  }
-
-  private[ml] def apply[T: Generator](lim: Limits[T])(implicit t: Generator[T]): RandomT[T] = t(lim)
-
-}
-
-/**
- * "For any distribution over a sample space with a finite maximum, the maximum of 60 random
- * observations lies within the top 5% of the true maximum, with 95% probability"
- * - Evaluating Machine Learning Models by Alice Zheng
- * https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
- *
- * Note: if you want more sophisticated hyperparameter tuning, consider Python libraries
- * such as Hyperopt.
- */
-@Since("3.2.0")
-class ParamRandomBuilder extends ParamGridBuilder {
-  def addRandom[T: Generator](param: Param[T], lim: Limits[T], n: Int): this.type = {
-    val gen: RandomT[T] = RandomRanges(lim)
-    addGrid(param, (1 to n).map { _: Int => gen.randomT() })
-  }
-
-  def addLog10Random[T: Generator](param: Param[T], lim: Limits[T], n: Int): this.type =
-    addLogRandom(param, lim, n, 10)
-
-  private def addLogRandom[T: Generator](param: Param[T], lim: Limits[T],
-                                         n: Int, base: Int): this.type = {
-    val gen: RandomT[T] = RandomRanges(lim)
-    addGrid(param, (1 to n).map { _: Int => gen.randomTLog(base) })
-  }
-
-  // specialized versions for Java.
-
-  def addRandom(param: DoubleParam, x: Double, y: Double, n: Int): this.type =
-    addRandom(param, Limits(x, y), n)(DoubleGenerator)
-
-  def addLog10Random(param: DoubleParam, x: Double, y: Double, n: Int): this.type =
-    addLogRandom(param, Limits(x, y), n, 10)(DoubleGenerator)
-
-  def addRandom(param: FloatParam, x: Float, y: Float, n: Int): this.type =
-    addRandom(param, Limits(x, y), n)(FloatGenerator)
-
-  def addLog10Random(param: FloatParam, x: Float, y: Float, n: Int): this.type =
-    addLogRandom(param, Limits(x, y), n, 10)(FloatGenerator)
-
-  def addRandom(param: IntParam, x: Int, y: Int, n: Int): this.type =
-    addRandom(param, Limits(x, y), n)(IntGenerator)
-
-  def addLog10Random(param: IntParam, x: Int, y: Int, n: Int): this.type =
-    addLogRandom(param, Limits(x, y), n, 10)(IntGenerator)
-
-}
diff --git a/mllib/src/test/scala/org/apache/spark/ml/tuning/ParamRandomBuilderSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/tuning/ParamRandomBuilderSuite.scala
deleted file mode 100644
index e17c48e..0000000
--- a/mllib/src/test/scala/org/apache/spark/ml/tuning/ParamRandomBuilderSuite.scala
+++ /dev/null
@@ -1,123 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.ml.tuning
-
-import org.scalatest.matchers.must.Matchers
-import org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks
-
-import org.apache.spark.SparkFunSuite
-import org.apache.spark.ml.param._
-
-class ParamRandomBuilderSuite extends SparkFunSuite with ScalaCheckDrivenPropertyChecks
-  with Matchers {
-
-  val solver = new TestParams() {
-    private val randomColName = "randomVal"
-    val DummyDoubleParam = new DoubleParam(this, randomColName, "doc")
-    val DummyFloatParam = new FloatParam(this, randomColName, "doc")
-    val DummyIntParam = new IntParam(this, randomColName, "doc")
-  }
-  import solver._
-
-  val DoubleLimits: Limits[Double] = Limits(1d, 100d)
-  val FloatLimits: Limits[Float] = Limits(1f, 100f)
-  val IntLimits: Limits[Int] = Limits(1, 100)
-  val nRandoms: Int = 5
-
-  // Java API
-
-  test("Java API random Double linear params mixed with fixed values") {
-    checkRangeAndCardinality(
-      _.addRandom(DummyDoubleParam, DoubleLimits.x, DoubleLimits.y, nRandoms),
-      DoubleLimits,
-      DummyDoubleParam)
-  }
-
-  test("Java API random Double log10 params mixed with fixed values") {
-    checkRangeAndCardinality(
-      _.addLog10Random(DummyDoubleParam, DoubleLimits.x, DoubleLimits.y, nRandoms),
-      DoubleLimits,
-      DummyDoubleParam)
-  }
-
-  test("Java API random Float linear params mixed with fixed values") {
-    checkRangeAndCardinality(
-      _.addRandom(DummyFloatParam, FloatLimits.x, FloatLimits.y, nRandoms),
-      FloatLimits,
-      DummyFloatParam)
-  }
-
-  test("Java API random Float log10 params mixed with fixed values") {
-    checkRangeAndCardinality(
-      _.addLog10Random(DummyFloatParam, FloatLimits.x, FloatLimits.y, nRandoms),
-      FloatLimits,
-      DummyFloatParam)
-  }
-
-  test("Java API random Int linear params mixed with fixed values") {
-    checkRangeAndCardinality(
-      _.addRandom(DummyIntParam, IntLimits.x, IntLimits.y, nRandoms),
-      IntLimits,
-      DummyIntParam)
-  }
-
-  test("Java API random Int log10 params mixed with fixed values") {
-    checkRangeAndCardinality(
-      _.addLog10Random(DummyIntParam, IntLimits.x, IntLimits.y, nRandoms),
-      IntLimits,
-      DummyIntParam)
-  }
-
-  // Scala API
-
-  test("random linear params mixed with fixed values") {
-    import RandomRanges._
-    checkRangeAndCardinality(_.addRandom(DummyDoubleParam, DoubleLimits, nRandoms),
-      DoubleLimits,
-      DummyDoubleParam)
-  }
-
-  test("random log10 params mixed with fixed values") {
-    import RandomRanges._
-    checkRangeAndCardinality(_.addLog10Random(DummyDoubleParam, DoubleLimits, nRandoms),
-      DoubleLimits,
-      DummyDoubleParam)
-  }
-
-  def checkRangeAndCardinality[T: Numeric](addFn: ParamRandomBuilder => ParamRandomBuilder,
-                               lim: Limits[T],
-                               randomCol: Param[T]): Unit = {
-    val maxIterations: Int = 10
-    val basedOn: Array[ParamPair[_]] = Array(maxIter -> maxIterations)
-    val inputCols: Array[String] = Array("input0", "input1")
-    val ops: Numeric[T] = implicitly[Numeric[T]]
-
-    val builder: ParamRandomBuilder = new ParamRandomBuilder()
-      .baseOn(basedOn: _*)
-      .addGrid(inputCol, inputCols)
-    val paramMap: Array[ParamMap] = addFn(builder).build()
-    assert(paramMap.length == inputCols.length * nRandoms * basedOn.length)
-    paramMap.foreach { m: ParamMap =>
-      assert(m(maxIter) == maxIterations)
-      assert(inputCols contains  m(inputCol))
-      assert(ops.gteq(m(randomCol), lim.x))
-      assert(ops.lteq(m(randomCol), lim.y))
-    }
-  }
-
-}
diff --git a/mllib/src/test/scala/org/apache/spark/ml/tuning/RandomRangesSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/tuning/RandomRangesSuite.scala
deleted file mode 100644
index afcbc03..0000000
--- a/mllib/src/test/scala/org/apache/spark/ml/tuning/RandomRangesSuite.scala
+++ /dev/null
@@ -1,168 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.ml.tuning
-
-import scala.reflect.runtime.universe.TypeTag
-
-import org.scalacheck.{Arbitrary, Gen}
-import org.scalacheck.Arbitrary._
-import org.scalacheck.Gen.Choose
-import org.scalatest.{Assertion, Succeeded}
-import org.scalatest.matchers.must.Matchers
-import org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks
-
-import org.apache.spark.SparkFunSuite
-
-class RandomRangesSuite extends SparkFunSuite with ScalaCheckDrivenPropertyChecks with Matchers {
-
-  import RandomRanges._
-
-  test("log of any base") {
-    assert(logN(16, 4) == 2d)
-    assert(logN(1000, 10) === (3d +- 0.000001))
-    assert(logN(256, 2) == 8d)
-  }
-
-  test("random doubles in log space") {
-    val gen: Gen[(Double, Double, Int)] = for {
-      x <- Gen.choose(0d, Double.MaxValue)
-      y <- Gen.choose(0d, Double.MaxValue)
-      n <- Gen.choose(0, Int.MaxValue)
-    } yield (x, y, n)
-    forAll(gen) { case (x, y, n) =>
-      val lower = math.min(x, y)
-      val upper = math.max(x, y)
-      val result = randomLog(x, y, n)
-      assert(result >= lower && result <= upper)
-    }
-  }
-
-  test("random BigInt generation does not go into infinite loop") {
-    assert(randomBigInt0To(0) == BigInt(0))
-  }
-
-  test("random ints") {
-    checkRange(Linear[Int])
-  }
-
-  test("random log ints") {
-    checkRange(Log10[Int])
-  }
-
-  test("random int distribution") {
-    checkDistributionOf(1000)
-  }
-
-  test("random doubles") {
-    checkRange(Linear[Double])
-  }
-
-  test("random log doubles") {
-    checkRange(Log10[Double])
-  }
-
-  test("random double distribution") {
-    checkDistributionOf(1000d)
-  }
-
-  test("random floats") {
-    checkRange(Linear[Float])
-  }
-
-  test("random log floats") {
-    checkRange(Log10[Float])
-  }
-
-  test("random float distribution") {
-    checkDistributionOf(1000f)
-  }
-
-  private abstract class RandomFn[T: Numeric: Generator] {
-    def apply(genRandom: RandomT[T]): T = genRandom.randomT()
-    def appropriate(x: T, y: T): Boolean
-  }
-
-  private def Linear[T: Numeric: Generator]: RandomFn[T] = new RandomFn {
-    override def apply(genRandom: RandomT[T]): T = genRandom.randomT()
-    override def appropriate(x: T, y: T): Boolean = true
-  }
-
-  private def Log10[T: Numeric: Generator]: RandomFn[T] = new RandomFn {
-    override def apply(genRandom: RandomT[T]): T = genRandom.randomTLog(10)
-    val ops: Numeric[T] = implicitly[Numeric[T]]
-    override def appropriate(x: T, y: T): Boolean = {
-      ops.gt(x, ops.zero) && ops.gt(y, ops.zero) && x != y
-    }
-  }
-
-  private def checkRange[T: Numeric: Generator: Choose: TypeTag: Arbitrary]
-  (rand: RandomFn[T]): Assertion =
-    forAll { (x: T, y: T) =>
-      if (rand.appropriate(x, y)) {
-        val ops: Numeric[T] = implicitly[Numeric[T]]
-        val limit: Limits[T] = Limits(x, y)
-        val gen: RandomT[T] = RandomRanges(limit)
-        val result: T = rand(gen)
-        val ordered: (T, T) = lowerUpper(x, y)
-        assert(ops.gteq(result, ordered._1) && ops.lteq(result, ordered._2))
-      } else Succeeded
-    }
-
-  private def checkDistributionOf[T: Numeric: Generator: Choose](range: T): Unit = {
-    val ops: Numeric[T] = implicitly[Numeric[T]]
-    import ops._
-    val gen: Gen[(T, T)] = for {
-      x <- Gen.choose(negate(range), range)
-      y <- Gen.choose(range, times(range, plus(one, one)))
-    } yield (x, y)
-    forAll(gen) { case (x, y) =>
-      assertEvenDistribution(10000, Limits(x, y))
-    }
-  }
-
-  private def meanAndStandardDeviation[T: Numeric](xs: Seq[T]): (Double, Double) = {
-    val ops: Numeric[T] = implicitly[Numeric[T]]
-    val n: Int = xs.length
-    val mean: Double = ops.toDouble(xs.sum) / n
-    val squaredDiff: Seq[Double] = xs.map { x: T => math.pow(ops.toDouble(x) - mean, 2) }
-    val stdDev: Double = math.pow(squaredDiff.sum / n - 1, 0.5)
-    (mean, stdDev)
-  }
-
-  private def lowerUpper[T: Numeric](x: T, y: T): (T, T) = {
-    val ops: Numeric[T] = implicitly[Numeric[T]]
-    (ops.min(x, y), ops.max(x, y))
-  }
-
-  private def midPointOf[T: Numeric : Generator](lim: Limits[T]): Double = {
-    val ordered: (T, T) = lowerUpper(lim.x, lim.y)
-    val ops: Numeric[T] = implicitly[Numeric[T]]
-    val range: T = ops.minus(ordered._2, ordered._1)
-    (ops.toDouble(range) / 2) + ops.toDouble(ordered._1)
-  }
-
-  private def assertEvenDistribution[T: Numeric: Generator](n: Int, lim: Limits[T]): Assertion = {
-    val gen: RandomT[T] = RandomRanges(lim)
-    val xs: Seq[T] = (0 to n).map { _: Int => gen.randomT() }
-    val (mean, stdDev) = meanAndStandardDeviation(xs)
-    val tolerance: Double = 4 * stdDev
-    val halfWay: Double = midPointOf(lim)
-    assert(mean > halfWay - tolerance && mean < halfWay + tolerance)
-  }
-
-}
diff --git a/python/docs/source/reference/pyspark.ml.rst b/python/docs/source/reference/pyspark.ml.rst
index fc6060c..7837d60 100644
--- a/python/docs/source/reference/pyspark.ml.rst
+++ b/python/docs/source/reference/pyspark.ml.rst
@@ -288,7 +288,6 @@ Tuning
     :toctree: api/
 
     ParamGridBuilder
-    ParamRandomBuilder
     CrossValidator
     CrossValidatorModel
     TrainValidationSplit
diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py
index 21551fd..83baf3e 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -16,7 +16,6 @@
 #
 
 import tempfile
-import math
 import unittest
 
 import numpy as np
@@ -28,7 +27,7 @@ from pyspark.ml.evaluation import BinaryClassificationEvaluator, \
 from pyspark.ml.linalg import Vectors
 from pyspark.ml.param import Param, Params
 from pyspark.ml.tuning import CrossValidator, CrossValidatorModel, ParamGridBuilder, \
-    TrainValidationSplit, TrainValidationSplitModel, ParamRandomBuilder
+    TrainValidationSplit, TrainValidationSplitModel
 from pyspark.sql.functions import rand
 from pyspark.testing.mlutils import DummyEvaluator, DummyLogisticRegression, \
     DummyLogisticRegressionModel, SparkSessionTestCase
@@ -67,108 +66,6 @@ class InducedErrorEstimator(Estimator, HasInducedError):
         return model
 
 
-class DummyParams(Params):
-
-    def __init__(self):
-        super(DummyParams, self).__init__()
-        self.test_param = Param(self, "test_param", "dummy parameter for testing")
-        self.another_test_param = Param(self, "another_test_param", "second parameter for testing")
-
-
-class ParamRandomBuilderTests(unittest.TestCase):
-
-    def __init__(self, methodName):
-        super(ParamRandomBuilderTests, self).__init__(methodName=methodName)
-        self.dummy_params = DummyParams()
-        self.to_test = ParamRandomBuilder()
-        self.n = 100
-
-    def check_ranges(self, params, lowest, highest, expected_type):
-        self.assertEqual(self.n, len(params))
-        for param in params:
-            for v in param.values():
-                self.assertGreaterEqual(v, lowest)
-                self.assertLessEqual(v, highest)
-                self.assertEqual(type(v), expected_type)
-
-    def check_addRandom_ranges(self, x, y, expected_type):
-        params = self.to_test.addRandom(self.dummy_params.test_param, x, y, self.n).build()
-        self.check_ranges(params, x, y, expected_type)
-
-    def check_addLog10Random_ranges(self, x, y, expected_type):
-        params = self.to_test.addLog10Random(self.dummy_params.test_param, x, y, self.n).build()
-        self.check_ranges(params, x, y, expected_type)
-
-    @staticmethod
-    def counts(xs):
-        key_to_count = {}
-        for v in xs:
-            k = int(v)
-            if key_to_count.get(k) is None:
-                key_to_count[k] = 1
-            else:
-                key_to_count[k] = key_to_count[k] + 1
-        return key_to_count
-
-    @staticmethod
-    def raw_values_of(params):
-        values = []
-        for param in params:
-            for v in param.values():
-                values.append(v)
-        return values
-
-    def check_even_distribution(self, vs, bin_function):
-        binned = map(lambda x: bin_function(x), vs)
-        histogram = self.counts(binned)
-        values = list(histogram.values())
-        sd = np.std(values)
-        mu = np.mean(values)
-        for k, v in histogram.items():
-            self.assertLess(abs(v - mu), 5 * sd, "{} values for bucket {} is unlikely "
-                                                 "when the mean is {} and standard deviation {}"
-                            .format(v, k, mu, sd))
-
-    def test_distribution(self):
-        params = self.to_test.addRandom(self.dummy_params.test_param, 0, 20000, 10000).build()
-        values = self.raw_values_of(params)
-        self.check_even_distribution(values, lambda x: x // 1000)
-
-    def test_logarithmic_distribution(self):
-        params = self.to_test.addLog10Random(self.dummy_params.test_param, 1, 1e10, 10000).build()
-        values = self.raw_values_of(params)
-        self.check_even_distribution(values, lambda x: math.log10(x))
-
-    def test_param_cardinality(self):
-        num_random_params = 7
-        values = [1, 2, 3]
-        self.to_test.addRandom(self.dummy_params.test_param, 1, 10, num_random_params)
-        self.to_test.addGrid(self.dummy_params.another_test_param, values)
-        self.assertEqual(len(self.to_test.build()), num_random_params * len(values))
-
-    def test_add_random_integer_logarithmic_range(self):
-        self.check_addLog10Random_ranges(100, 200, int)
-
-    def test_add_logarithmic_random_float_and_integer_yields_floats(self):
-        self.check_addLog10Random_ranges(100, 200., float)
-
-    def test_add_random_float_logarithmic_range(self):
-        self.check_addLog10Random_ranges(100., 200., float)
-
-    def test_add_random_integer_range(self):
-        self.check_addRandom_ranges(100, 200, int)
-
-    def test_add_random_float_and_integer_yields_floats(self):
-        self.check_addRandom_ranges(100, 200., float)
-
-    def test_add_random_float_range(self):
-        self.check_addRandom_ranges(100., 200., float)
-
-    def test_unexpected_type(self):
-        with self.assertRaises(TypeError):
-            self.to_test.addRandom(self.dummy_params.test_param, 1, "wrong type", 1).build()
-
-
 class ParamGridBuilderTests(SparkSessionTestCase):
 
     def test_addGrid(self):
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 2c8b9d8..2436abb 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -18,8 +18,6 @@
 import os
 import sys
 import itertools
-import random
-import math
 from multiprocessing.pool import ThreadPool
 
 import numpy as np
@@ -37,7 +35,7 @@ from pyspark.sql.functions import col, lit, rand, UserDefinedFunction
 from pyspark.sql.types import BooleanType
 
 __all__ = ['ParamGridBuilder', 'CrossValidator', 'CrossValidatorModel', 'TrainValidationSplit',
-           'TrainValidationSplitModel', 'ParamRandomBuilder']
+           'TrainValidationSplitModel']
 
 
 def _parallelFitTasks(est, train, eva, validation, epm, collectSubModel):
@@ -154,50 +152,6 @@ class ParamGridBuilder(object):
         return [dict(to_key_value_pairs(keys, prod)) for prod in itertools.product(*grid_values)]
 
 
-class ParamRandomBuilder(ParamGridBuilder):
-    r"""
-    Builder for random value parameters used in search-based model selection.
-
-
-    .. versionadded:: 3.2.0
-    """
-
-    @since("3.2.0")
-    def addRandom(self, param, x, y, n):
-        """
-        Adds n random values between x and y.
-        The arguments x and y can be integers, floats or a combination of the two. If either
-        x or y is a float, the domain of the random value will be float.
-        """
-        if type(x) == int and type(y) == int:
-            values = map(lambda _: random.randrange(x, y), range(n))
-        elif type(x) == float or type(y) == float:
-            values = map(lambda _: random.uniform(x, y), range(n))
-        else:
-            raise TypeError("unable to make range for types %s and %s" % type(x) % type(y))
-        self.addGrid(param, values)
-        return self
-
-    @since("3.2.0")
-    def addLog10Random(self, param, x, y, n):
-        """
-        Adds n random values scaled logarithmically (base 10) between x and y.
-        For instance, a distribution for x=1.0, y=10000.0 and n=5 might reasonably look like
-        [1.6, 65.3, 221.9, 1024.3, 8997.5]
-        """
-        def logarithmic_random():
-            rand = random.uniform(math.log10(x), math.log10(y))
-            value = 10 ** rand
-            if type(x) == int and type(y) == int:
-                value = int(value)
-            return value
-
-        values = map(lambda _: logarithmic_random(), range(n))
-        self.addGrid(param, values)
-
-        return self
-
-
 class _ValidatorParams(HasSeed):
     """
     Common params for TrainValidationSplit and CrossValidator.
diff --git a/python/pyspark/ml/tuning.pyi b/python/pyspark/ml/tuning.pyi
index 028cebd..912abd4 100644
--- a/python/pyspark/ml/tuning.pyi
+++ b/python/pyspark/ml/tuning.pyi
@@ -35,11 +35,6 @@ class ParamGridBuilder:
     def baseOn(self, *args: Tuple[Param, Any]) -> ParamGridBuilder: ...
     def build(self) -> List[ParamMap]: ...
 
-class ParamRandomBuilder(ParamGridBuilder):
-    def __init__(self) -> None: ...
-    def addRandom(self, param: Param, x: Any, y: Any, n: int) -> ParamRandomBuilder: ...
-    def addLog10Random(self, param: Param, x: Any, y: Any, n: int) -> ParamRandomBuilder: ...
-
 class _ValidatorParams(HasSeed):
     estimator: Param[Estimator]
     estimatorParamMaps: Param[List[ParamMap]]

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org