You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2021/01/16 03:12:08 UTC

[spark] branch branch-3.1 updated: [SPARK-34080][ML][PYTHON] Add UnivariateFeatureSelector

This is an automated email from the ASF dual-hosted git repository.

weichenxu123 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new cb8fb0e  [SPARK-34080][ML][PYTHON] Add UnivariateFeatureSelector
cb8fb0e is described below

commit cb8fb0e3c43743dacc7a5e06d028ff60b49d9a5b
Author: Huaxin Gao <hu...@us.ibm.com>
AuthorDate: Sat Jan 16 11:09:23 2021 +0800

    [SPARK-34080][ML][PYTHON] Add UnivariateFeatureSelector
    
    ### What changes were proposed in this pull request?
    Add UnivariateFeatureSelector
    
    ### Why are the changes needed?
    Have one UnivariateFeatureSelector, so we don't need to have three Feature Selectors.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes
    ```
    selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], labelCol=["target"], featureType="categorical", labelType="continuous", selectorType="numTopFeatures",  numTopFeatures=100)
    ```
    
    Or
    
    numTopFeatures
    ```
    selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], labelCol=["target"], scoreFunction="f_classif", selectorType="numTopFeatures",  numTopFeatures=100)
    ```
    
    ### How was this patch tested?
    Add Unit test
    
    Closes #31160 from huaxingao/UnivariateSelector.
    
    Authored-by: Huaxin Gao <hu...@us.ibm.com>
    Signed-off-by: Weichen Xu <we...@databricks.com>
    (cherry picked from commit f3548837c643b2da03ce6b20b5b103e4392e52dc)
    Signed-off-by: Weichen Xu <we...@databricks.com>
---
 docs/ml-features.md                                | 111 +---
 docs/ml-statistics.md                              |  54 +-
 .../spark/examples/ml/JavaANOVATestExample.java    |  75 ---
 .../examples/ml/JavaFValueSelectorExample.java     |  81 ---
 .../spark/examples/ml/JavaFValueTestExample.java   |  75 ---
 ...a => JavaUnivariateFeatureSelectorExample.java} |  21 +-
 examples/src/main/python/ml/anova_test_example.py  |  50 --
 .../src/main/python/ml/fvalue_selector_example.py  |  53 --
 examples/src/main/python/ml/fvalue_test_example.py |  50 --
 ...e.py => univariate_feature_selector_example.py} |  16 +-
 .../spark/examples/ml/ANOVATestExample.scala       |  63 --
 .../spark/examples/ml/FValueSelectorExample.scala  |  69 ---
 .../spark/examples/ml/FValueTestExample.scala      |  63 --
 ...cala => UnivariateFeatureSelectorExample.scala} |  20 +-
 .../apache/spark/ml/feature/ANOVASelector.scala    | 195 ------
 .../apache/spark/ml/feature/ChiSqSelector.scala    |   1 +
 .../apache/spark/ml/feature/FValueSelector.scala   | 195 ------
 .../org/apache/spark/ml/feature/Selector.scala     |  12 +-
 .../ml/feature/UnivariateFeatureSelector.scala     | 467 ++++++++++++++
 .../scala/org/apache/spark/ml/stat/ANOVATest.scala |   2 +-
 .../org/apache/spark/ml/stat/FValueTest.scala      |   2 +-
 .../spark/ml/feature/ANOVASelectorSuite.scala      | 206 -------
 .../spark/ml/feature/FValueSelectorSuite.scala     | 238 -------
 .../feature/UnivariateFeatureSelectorSuite.scala   | 685 +++++++++++++++++++++
 python/docs/source/reference/pyspark.ml.rst        |   8 +-
 python/pyspark/ml/feature.py                       | 449 +++++++-------
 python/pyspark/ml/feature.pyi                      | 116 ++--
 python/pyspark/ml/stat.py                          | 148 -----
 python/pyspark/ml/stat.pyi                         |  12 -
 29 files changed, 1512 insertions(+), 2025 deletions(-)

diff --git a/docs/ml-features.md b/docs/ml-features.md
index 660c272..dc87713 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -1793,19 +1793,28 @@ for more details on the API.
 </div>
 </div>
 
-## ANOVASelector
+## UnivariateFeatureSelector
 
-`ANOVASelector` operates on categorical labels with continuous features. It uses the
-[one-way ANOVA F-test](https://en.wikipedia.org/wiki/F-test#Multiple-comparison_ANOVA_problems) to decide which
-features to choose.
-It supports five selection methods: `numTopFeatures`, `percentile`, `fpr`, `fdr`, `fwe`:
-* `numTopFeatures` chooses a fixed number of top features according to ANOVA F-test.
+`UnivariateFeatureSelector` operates on categorical/continuous labels with categorical/continuous features. 
+User can set `featureType` and `labelType`, and Spark will pick the score function to use based on the specified 
+`featureType` and `labelType`. 
+
+~~~
+featureType |  labelType |score function
+------------|------------|--------------
+categorical |categorical | chi2
+continuous  |categorical | f_classif
+continuous  |continuous  | f_regression
+~~~
+
+It supports five selection modes: `numTopFeatures`, `percentile`, `fpr`, `fdr`, `fwe`:
+* `numTopFeatures` chooses a fixed number of top features.
 * `percentile` is similar to `numTopFeatures` but chooses a fraction of all features instead of a fixed number.
 * `fpr` chooses all features whose p-values are below a threshold, thus controlling the false positive rate of selection.
 * `fdr` uses the [Benjamini-Hochberg procedure](https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure) to choose all features whose false discovery rate is below a threshold.
 * `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection.
-By default, the selection method is `numTopFeatures`, with the default number of top features set to 50.
-The user can choose a selection method using `setSelectorType`.
+
+By default, the selection mode is `numTopFeatures`, with the default selectionThreshold sets to 50.
 
 **Examples**
 
@@ -1823,7 +1832,7 @@ id | features                       | label
  6 | [7.9, 8.5, 9.2, 4.0, 9.4, 2.1] | 4.0
 ~~~
 
-If we use `ANOVASelector` with `numTopFeatures = 1`, the
+If we set `featureType` to `continuous` and `labelType` to `categorical` with `numTopFeatures = 1`, the
 last column in our `features` is chosen as the most useful feature:
 
 ~~~
@@ -1840,96 +1849,26 @@ id | features                       | label   | selectedFeatures
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
 
-Refer to the [ANOVASelector Scala docs](api/scala/org/apache/spark/ml/feature/ANOVASelector.html)
-for more details on the API.
-
-{% include_example scala/org/apache/spark/examples/ml/ANOVASelectorExample.scala %}
-</div>
-
-<div data-lang="java" markdown="1">
-
-Refer to the [ANOVASelector Java docs](api/java/org/apache/spark/ml/feature/ANOVASelector.html)
-for more details on the API.
-
-{% include_example java/org/apache/spark/examples/ml/JavaANOVASelectorExample.java %}
-</div>
-
-<div data-lang="python" markdown="1">
-
-Refer to the [ANOVASelector Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.ANOVASelector)
-for more details on the API.
-
-{% include_example python/ml/anova_selector_example.py %}
-</div>
-</div>
-
-## FValueSelector
-
-`FValueSelector` operates on categorical labels with continuous features. It uses the
-[F-test for regression](https://en.wikipedia.org/wiki/F-test#Regression_problems) to decide which
-features to choose.
-It supports five selection methods: `numTopFeatures`, `percentile`, `fpr`, `fdr`, `fwe`:
-* `numTopFeatures` chooses a fixed number of top features according to a F-test for regression.
-* `percentile` is similar to `numTopFeatures` but chooses a fraction of all features instead of a fixed number.
-* `fpr` chooses all features whose p-values are below a threshold, thus controlling the false positive rate of selection.
-* `fdr` uses the [Benjamini-Hochberg procedure](https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure) to choose all features whose false discovery rate is below a threshold.
-* `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection.
-By default, the selection method is `numTopFeatures`, with the default number of top features set to 50.
-The user can choose a selection method using `setSelectorType`.
-
-**Examples**
-
-Assume that we have a DataFrame with the columns `id`, `features`, and `label`, which is used as
-our target to be predicted:
-
-~~~
-id | features                       | label
----|--------------------------------|---------
- 1 | [6.0, 7.0, 0.0, 7.0, 6.0, 0.0] | 4.6
- 2 | [0.0, 9.0, 6.0, 0.0, 5.0, 9.0] | 6.6
- 3 | [0.0, 9.0, 3.0, 0.0, 5.0, 5.0] | 5.1
- 4 | [0.0, 9.0, 8.0, 5.0, 6.0, 4.0] | 7.6
- 5 | [8.0, 9.0, 6.0, 5.0, 4.0, 4.0] | 9.0
- 6 | [8.0, 9.0, 6.0, 4.0, 0.0, 0.0] | 9.0
-~~~
-
-If we use `FValueSelector` with `numTopFeatures = 1`, the
-3rd column in our `features` is chosen as the most useful feature:
-
-~~~
-id | features                       | label   | selectedFeatures
----|--------------------------------|---------|------------------
- 1 | [6.0, 7.0, 0.0, 7.0, 6.0, 0.0] | 4.6     | [0.0]
- 2 | [0.0, 9.0, 6.0, 0.0, 5.0, 9.0] | 6.6     | [6.0]
- 3 | [0.0, 9.0, 3.0, 0.0, 5.0, 5.0] | 5.1     | [3.0]
- 4 | [0.0, 9.0, 8.0, 5.0, 6.0, 4.0] | 7.6     | [8.0]
- 5 | [8.0, 9.0, 6.0, 5.0, 4.0, 4.0] | 9.0     | [6.0]
- 6 | [8.0, 9.0, 6.0, 4.0, 0.0, 0.0] | 9.0     | [6.0]
-~~~
-
-<div class="codetabs">
-<div data-lang="scala" markdown="1">
-
-Refer to the [FValueSelector Scala docs](api/scala/org/apache/spark/ml/feature/FValueSelector.html)
+Refer to the [UnivariateFeatureSelector Scala docs](api/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.html)
 for more details on the API.
 
-{% include_example scala/org/apache/spark/examples/ml/FValueSelectorExample.scala %}
+{% include_example scala/org/apache/spark/examples/ml/UnivariateFeatureSelectorExample.scala %}
 </div>
 
 <div data-lang="java" markdown="1">
 
-Refer to the [FValueSelector Java docs](api/java/org/apache/spark/ml/feature/FValueSelector.html)
+Refer to the [UnivariateFeatureSelector Java docs](api/java/org/apache/spark/ml/feature/UnivariateFeatureSelector.html)
 for more details on the API.
 
-{% include_example java/org/apache/spark/examples/ml/JavaFValueSelectorExample.java %}
+{% include_example java/org/apache/spark/examples/ml/JavaUnivariateFeatureSelectorExample.java %}
 </div>
 
 <div data-lang="python" markdown="1">
 
-Refer to the [FValueSelector Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.FValueSelector)
+Refer to the [UnivariateFeatureSelector Python docs](api/python/reference/api/pyspark.ml.feature.UnivariateFeatureSelector.html)
 for more details on the API.
 
-{% include_example python/ml/anova_selector_example.py %}
+{% include_example python/ml/univariate_feature_selector_example.py %}
 </div>
 </div>
 
@@ -1974,7 +1913,7 @@ id | features                       | selectedFeatures
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
 
-Refer to the [VarianceThresholdSelector Scala docs]((api/python/pyspark.ml.html#pyspark.ml.feature.ChiSqSelector))
+Refer to the [VarianceThresholdSelector Scala docs]((api/python/pyspark.ml.html#pyspark.ml.feature.VarianceThresholdSelector))
 for more details on the API.
 
 {% include_example scala/org/apache/spark/examples/ml/VarianceThresholdSelectorExample.scala %}
diff --git a/docs/ml-statistics.md b/docs/ml-statistics.md
index 637cdd6..334a42e 100644
--- a/docs/ml-statistics.md
+++ b/docs/ml-statistics.md
@@ -79,33 +79,7 @@ The output will be a DataFrame that contains the correlation matrix of the colum
 
 Hypothesis testing is a powerful tool in statistics to determine whether a result is statistically
 significant, whether this result occurred by chance or not. `spark.ml` currently supports Pearson's
-Chi-squared ( $\chi^2$) tests for independence, as well as ANOVA test for classification tasks and
-F-value test for regression tasks.
-
-### ANOVATest
-
-`ANOVATest` computes ANOVA F-values between labels and features for classification tasks. The labels should be categorical
-and features should be continuous.
-
-<div class="codetabs">
-<div data-lang="scala" markdown="1">
-Refer to the [`ANOVATest` Scala docs](api/scala/org/apache/spark/ml/stat/ANOVATest$.html) for details on the API.
-
-{% include_example scala/org/apache/spark/examples/ml/ANOVATestExample.scala %}
-</div>
-
-<div data-lang="java" markdown="1">
-Refer to the [`ANOVATest` Java docs](api/java/org/apache/spark/ml/stat/ANOVATest.html) for details on the API.
-
-{% include_example java/org/apache/spark/examples/ml/JavaANOVATestExample.java %}
-</div>
-
-<div data-lang="python" markdown="1">
-Refer to the [`ANOVATest` Python docs](api/python/index.html#pyspark.ml.stat.ANOVATest$) for details on the API.
-
-{% include_example python/ml/anova_test_example.py %}
-</div>
-</div>
+Chi-squared ( $\chi^2$) tests for independence.
 
 ### ChiSquareTest
 
@@ -134,32 +108,6 @@ Refer to the [`ChiSquareTest` Python docs](api/python/index.html#pyspark.ml.stat
 
 </div>
 
-### FValueTest
-
-`FValueTest` computes F-values between labels and features for regression tasks. Both the labels
- and features should be continuous.
-
- <div class="codetabs">
- <div data-lang="scala" markdown="1">
- Refer to the [`FValueTest` Scala docs](api/scala/org/apache/spark/ml/stat/FValueTest$.html) for details on the API.
-
- {% include_example scala/org/apache/spark/examples/ml/FValueTestExample.scala %}
- </div>
-
- <div data-lang="java" markdown="1">
- Refer to the [`FValueTest` Java docs](api/java/org/apache/spark/ml/stat/FValueTest.html) for details on the API.
-
- {% include_example java/org/apache/spark/examples/ml/JavaFValueTestExample.java %}
- </div>
-
- <div data-lang="python" markdown="1">
- Refer to the [`FValueTest` Python docs](api/python/index.html#pyspark.ml.stat.FValueTest$) for details on the API.
-
- {% include_example python/ml/fvalue_test_example.py %}
- </div>
-
- </div>
-
 ## Summarizer
 
 We provide vector column summary statistics for `Dataframe` through `Summarizer`.
diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVATestExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVATestExample.java
deleted file mode 100644
index 4785dbd..0000000
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVATestExample.java
+++ /dev/null
@@ -1,75 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.examples.ml;
-
-import org.apache.spark.sql.SparkSession;
-
-// $example on$
-import java.util.Arrays;
-import java.util.List;
-
-import org.apache.spark.ml.linalg.Vectors;
-import org.apache.spark.ml.linalg.VectorUDT;
-import org.apache.spark.ml.stat.ANOVATest;
-import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.RowFactory;
-import org.apache.spark.sql.types.*;
-// $example off$
-
-/**
- * An example for ANOVA testing.
- * Run with
- * <pre>
- * bin/run-example ml.JavaANOVATestExample
- * </pre>
- */
-public class JavaANOVATestExample {
-
-  public static void main(String[] args) {
-    SparkSession spark = SparkSession
-      .builder()
-      .appName("JavaANOVATestExample")
-      .getOrCreate();
-
-    // $example on$
-    List<Row> data = Arrays.asList(
-      RowFactory.create(3.0, Vectors.dense(1.7, 4.4, 7.6, 5.8, 9.6, 2.3)),
-      RowFactory.create(2.0, Vectors.dense(8.8, 7.3, 5.7, 7.3, 2.2, 4.1)),
-      RowFactory.create(3.0, Vectors.dense(1.2, 9.5, 2.5, 3.1, 8.7, 2.5)),
-      RowFactory.create(2.0, Vectors.dense(3.7, 9.2, 6.1, 4.1, 7.5, 3.8)),
-      RowFactory.create(4.0, Vectors.dense(8.9, 5.2, 7.8, 8.3, 5.2, 3.0)),
-      RowFactory.create(4.0, Vectors.dense(7.9, 8.5, 9.2, 4.0, 9.4, 2.1))
-    );
-
-    StructType schema = new StructType(new StructField[]{
-      new StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
-      new StructField("features", new VectorUDT(), false, Metadata.empty()),
-    });
-
-    Dataset<Row> df = spark.createDataFrame(data, schema);
-    Row r = ANOVATest.test(df, "features", "label").head();
-    System.out.println("pValues: " + r.get(0).toString());
-    System.out.println("degreesOfFreedom: " + r.getList(1).toString());
-    System.out.println("fValues: " + r.get(2).toString());
-
-    // $example off$
-
-    spark.stop();
-  }
-}
diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaFValueSelectorExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaFValueSelectorExample.java
deleted file mode 100644
index e8253ff..0000000
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaFValueSelectorExample.java
+++ /dev/null
@@ -1,81 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.examples.ml;
-
-import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.SparkSession;
-
-// $example on$
-import java.util.Arrays;
-import java.util.List;
-
-import org.apache.spark.ml.feature.FValueSelector;
-import org.apache.spark.ml.linalg.VectorUDT;
-import org.apache.spark.ml.linalg.Vectors;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.RowFactory;
-import org.apache.spark.sql.types.*;
-// $example off$
-
-/**
- * An example demonstrating FValueSelector.
- * Run with
- * <pre>
- * bin/run-example ml.JavaFValueSelectorExample
- * </pre>
- */
-public class JavaFValueSelectorExample {
-  public static void main(String[] args) {
-    SparkSession spark = SparkSession
-      .builder()
-      .appName("JavaFValueSelectorExample")
-      .getOrCreate();
-
-    // $example on$
-    List<Row> data = Arrays.asList(
-      RowFactory.create(1, Vectors.dense(6.0, 7.0, 0.0, 7.0, 6.0, 0.0), 4.6),
-      RowFactory.create(2, Vectors.dense(0.0, 9.0, 6.0, 0.0, 5.0, 9.0), 6.6),
-      RowFactory.create(3, Vectors.dense(0.0, 9.0, 3.0, 0.0, 5.0, 5.0), 5.1),
-      RowFactory.create(4, Vectors.dense(0.0, 9.0, 8.0, 5.0, 6.0, 4.0), 7.6),
-      RowFactory.create(5, Vectors.dense(8.0, 9.0, 6.0, 5.0, 4.0, 4.0), 9.0),
-      RowFactory.create(6, Vectors.dense(8.0, 9.0, 6.0, 4.0, 0.0, 0.0), 9.0)
-    );
-    StructType schema = new StructType(new StructField[]{
-      new StructField("id", DataTypes.IntegerType, false, Metadata.empty()),
-      new StructField("features", new VectorUDT(), false, Metadata.empty()),
-      new StructField("label", DataTypes.DoubleType, false, Metadata.empty())
-    });
-
-    Dataset<Row> df = spark.createDataFrame(data, schema);
-
-    FValueSelector selector = new FValueSelector()
-      .setNumTopFeatures(1)
-      .setFeaturesCol("features")
-      .setLabelCol("label")
-      .setOutputCol("selectedFeatures");
-
-    Dataset<Row> result = selector.fit(df).transform(df);
-
-    System.out.println("FValueSelector output with top " + selector.getNumTopFeatures()
-        + " features selected");
-    result.show();
-
-    // $example off$
-    spark.stop();
-  }
-}
diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaFValueTestExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaFValueTestExample.java
deleted file mode 100644
index cda28db..0000000
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaFValueTestExample.java
+++ /dev/null
@@ -1,75 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.examples.ml;
-
-import org.apache.spark.sql.SparkSession;
-
-// $example on$
-import java.util.Arrays;
-import java.util.List;
-
-import org.apache.spark.ml.linalg.Vectors;
-import org.apache.spark.ml.linalg.VectorUDT;
-import org.apache.spark.ml.stat.FValueTest;
-import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.RowFactory;
-import org.apache.spark.sql.types.*;
-// $example off$
-
-/**
- * An example for FValue testing.
- * Run with
- * <pre>
- * bin/run-example ml.JavaFValueTestExample
- * </pre>
- */
-public class JavaFValueTestExample {
-
-  public static void main(String[] args) {
-    SparkSession spark = SparkSession
-      .builder()
-      .appName("JavaFValueTestExample")
-      .getOrCreate();
-
-    // $example on$
-    List<Row> data = Arrays.asList(
-      RowFactory.create(4.6, Vectors.dense(6.0, 7.0, 0.0, 7.0, 6.0, 0.0)),
-      RowFactory.create(6.6, Vectors.dense(0.0, 9.0, 6.0, 0.0, 5.0, 9.0)),
-      RowFactory.create(5.1, Vectors.dense(0.0, 9.0, 3.0, 0.0, 5.0, 5.0)),
-      RowFactory.create(7.6, Vectors.dense(0.0, 9.0, 8.0, 5.0, 6.0, 4.0)),
-      RowFactory.create(9.0, Vectors.dense(8.0, 9.0, 6.0, 5.0, 4.0, 4.0)),
-      RowFactory.create(9.0, Vectors.dense(8.0, 9.0, 6.0, 4.0, 0.0, 0.0))
-    );
-
-    StructType schema = new StructType(new StructField[]{
-      new StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
-      new StructField("features", new VectorUDT(), false, Metadata.empty()),
-    });
-
-    Dataset<Row> df = spark.createDataFrame(data, schema);
-    Row r = FValueTest.test(df, "features", "label").head();
-    System.out.println("pValues: " + r.get(0).toString());
-    System.out.println("degreesOfFreedom: " + r.getList(1).toString());
-    System.out.println("fvalues: " + r.get(2).toString());
-
-    // $example off$
-
-    spark.stop();
-  }
-}
diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVASelectorExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaUnivariateFeatureSelectorExample.java
similarity index 79%
rename from examples/src/main/java/org/apache/spark/examples/ml/JavaANOVASelectorExample.java
rename to examples/src/main/java/org/apache/spark/examples/ml/JavaUnivariateFeatureSelectorExample.java
index 6f24b45..748262f 100644
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVASelectorExample.java
+++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaUnivariateFeatureSelectorExample.java
@@ -24,7 +24,7 @@ import org.apache.spark.sql.SparkSession;
 import java.util.Arrays;
 import java.util.List;
 
-import org.apache.spark.ml.feature.ANOVASelector;
+import org.apache.spark.ml.feature.UnivariateFeatureSelector;
 import org.apache.spark.ml.linalg.VectorUDT;
 import org.apache.spark.ml.linalg.Vectors;
 import org.apache.spark.sql.Row;
@@ -33,17 +33,17 @@ import org.apache.spark.sql.types.*;
 // $example off$
 
 /**
- * An example for ANOVASelector.
+ * An example for UnivariateFeatureSelector.
  * Run with
  * <pre>
- * bin/run-example ml.JavaANOVASelectorExample
+ * bin/run-example ml.JavaUnivariateFeatureSelectorExample
  * </pre>
  */
-public class JavaANOVASelectorExample {
+public class JavaUnivariateFeatureSelectorExample {
   public static void main(String[] args) {
     SparkSession spark = SparkSession
       .builder()
-      .appName("JavaANOVASelectorExample")
+      .appName("JavaUnivariateFeatureSelectorExample")
       .getOrCreate();
 
     // $example on$
@@ -63,16 +63,19 @@ public class JavaANOVASelectorExample {
 
     Dataset<Row> df = spark.createDataFrame(data, schema);
 
-    ANOVASelector selector = new ANOVASelector()
-      .setNumTopFeatures(1)
+    UnivariateFeatureSelector selector = new UnivariateFeatureSelector()
+      .setFeatureType("continuous")
+      .setLabelType("categorical")
+      .setSelectionMode("numTopFeatures")
+      .setSelectionThreshold(1)
       .setFeaturesCol("features")
       .setLabelCol("label")
       .setOutputCol("selectedFeatures");
 
     Dataset<Row> result = selector.fit(df).transform(df);
 
-    System.out.println("ANOVASelector output with top " + selector.getNumTopFeatures()
-        + " features selected");
+    System.out.println("UnivariateFeatureSelector output with top "
+        + selector.getSelectionThreshold() + " features selected using f_classif");
     result.show();
 
     // $example off$
diff --git a/examples/src/main/python/ml/anova_test_example.py b/examples/src/main/python/ml/anova_test_example.py
deleted file mode 100644
index 451e078..0000000
--- a/examples/src/main/python/ml/anova_test_example.py
+++ /dev/null
@@ -1,50 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""
-An example for ANOVA testing.
-Run with:
-  bin/spark-submit examples/src/main/python/ml/anova_test_example.py
-"""
-from pyspark.sql import SparkSession
-# $example on$
-from pyspark.ml.linalg import Vectors
-from pyspark.ml.stat import ANOVATest
-# $example off$
-
-if __name__ == "__main__":
-    spark = SparkSession\
-        .builder\
-        .appName("ANOVATestExample")\
-        .getOrCreate()
-
-    # $example on$
-    data = [(3.0, Vectors.dense([1.7, 4.4, 7.6, 5.8, 9.6, 2.3])),
-            (2.0, Vectors.dense([8.8, 7.3, 5.7, 7.3, 2.2, 4.1])),
-            (3.0, Vectors.dense([1.2, 9.5, 2.5, 3.1, 8.7, 2.5])),
-            (2.0, Vectors.dense([3.7, 9.2, 6.1, 4.1, 7.5, 3.8])),
-            (4.0, Vectors.dense([8.9, 5.2, 7.8, 8.3, 5.2, 3.0])),
-            (4.0, Vectors.dense([7.9, 8.5, 9.2, 4.0, 9.4, 2.1]))]
-    df = spark.createDataFrame(data, ["label", "features"])
-
-    r = ANOVATest.test(df, "features", "label").head()
-    print("pValues: " + str(r.pValues))
-    print("degreesOfFreedom: " + str(r.degreesOfFreedom))
-    print("fValues: " + str(r.fValues))
-    # $example off$
-
-    spark.stop()
diff --git a/examples/src/main/python/ml/fvalue_selector_example.py b/examples/src/main/python/ml/fvalue_selector_example.py
deleted file mode 100644
index f164af4..0000000
--- a/examples/src/main/python/ml/fvalue_selector_example.py
+++ /dev/null
@@ -1,53 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""
-An example for FValueSelector.
-Run with:
-  bin/spark-submit examples/src/main/python/ml/fvalue_selector_example.py
-"""
-from pyspark.sql import SparkSession
-# $example on$
-from pyspark.ml.feature import FValueSelector
-from pyspark.ml.linalg import Vectors
-# $example off$
-
-if __name__ == "__main__":
-    spark = SparkSession\
-        .builder\
-        .appName("FValueSelectorExample")\
-        .getOrCreate()
-
-    # $example on$
-    df = spark.createDataFrame([
-        (1, Vectors.dense([6.0, 7.0, 0.0, 7.0, 6.0, 0.0]), 4.6,),
-        (2, Vectors.dense([0.0, 9.0, 6.0, 0.0, 5.0, 9.0]), 6.6,),
-        (3, Vectors.dense([0.0, 9.0, 3.0, 0.0, 5.0, 5.0]), 5.1,),
-        (4, Vectors.dense([0.0, 9.0, 8.0, 5.0, 6.0, 4.0]), 7.6,),
-        (5, Vectors.dense([8.0, 9.0, 6.0, 5.0, 4.0, 4.0]), 9.0,),
-        (6, Vectors.dense([8.0, 9.0, 6.0, 4.0, 0.0, 0.0]), 9.0,)], ["id", "features", "label"])
-
-    selector = FValueSelector(numTopFeatures=1, featuresCol="features",
-                              outputCol="selectedFeatures", labelCol="label")
-
-    result = selector.fit(df).transform(df)
-
-    print("FValueSelector output with top %d features selected" % selector.getNumTopFeatures())
-    result.show()
-    # $example off$
-
-    spark.stop()
diff --git a/examples/src/main/python/ml/fvalue_test_example.py b/examples/src/main/python/ml/fvalue_test_example.py
deleted file mode 100644
index dfa8073..0000000
--- a/examples/src/main/python/ml/fvalue_test_example.py
+++ /dev/null
@@ -1,50 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""
-An example for FValue testing.
-Run with:
-  bin/spark-submit examples/src/main/python/ml/fvalue_test_example.py
-"""
-from pyspark.sql import SparkSession
-# $example on$
-from pyspark.ml.linalg import Vectors
-from pyspark.ml.stat import FValueTest
-# $example off$
-
-if __name__ == "__main__":
-    spark = SparkSession \
-        .builder \
-        .appName("FValueTestExample") \
-        .getOrCreate()
-
-    # $example on$
-    data = [(4.6, Vectors.dense(6.0, 7.0, 0.0, 7.0, 6.0, 0.0)),
-            (6.6, Vectors.dense(0.0, 9.0, 6.0, 0.0, 5.0, 9.0)),
-            (5.1, Vectors.dense(0.0, 9.0, 3.0, 0.0, 5.0, 5.0)),
-            (7.6, Vectors.dense(0.0, 9.0, 8.0, 5.0, 6.0, 4.0)),
-            (9.0, Vectors.dense(8.0, 9.0, 6.0, 5.0, 4.0, 4.0)),
-            (9.0, Vectors.dense(8.0, 9.0, 6.0, 4.0, 0.0, 0.0))]
-    df = spark.createDataFrame(data, ["label", "features"])
-
-    ftest = FValueTest.test(df, "features", "label").head()
-    print("pValues: " + str(ftest.pValues))
-    print("degreesOfFreedom: " + str(ftest.degreesOfFreedom))
-    print("fvalues: " + str(ftest.fValues))
-    # $example off$
-
-    spark.stop()
diff --git a/examples/src/main/python/ml/anova_selector_example.py b/examples/src/main/python/ml/univariate_feature_selector_example.py
similarity index 70%
rename from examples/src/main/python/ml/anova_selector_example.py
rename to examples/src/main/python/ml/univariate_feature_selector_example.py
index da80fa6..6dc293e 100644
--- a/examples/src/main/python/ml/anova_selector_example.py
+++ b/examples/src/main/python/ml/univariate_feature_selector_example.py
@@ -16,20 +16,20 @@
 #
 
 """
-An example for ANOVASelector.
+An example for UnivariateFeatureSelector.
 Run with:
-  bin/spark-submit examples/src/main/python/ml/anova_selector_example.py
+  bin/spark-submit examples/src/main/python/ml/univariate_feature_selector_example.py
 """
 from pyspark.sql import SparkSession
 # $example on$
-from pyspark.ml.feature import ANOVASelector
+from pyspark.ml.feature import UnivariateFeatureSelector
 from pyspark.ml.linalg import Vectors
 # $example off$
 
 if __name__ == "__main__":
     spark = SparkSession\
         .builder\
-        .appName("ANOVASelectorExample")\
+        .appName("UnivariateFeatureSelectorExample")\
         .getOrCreate()
 
     # $example on$
@@ -41,12 +41,14 @@ if __name__ == "__main__":
         (5, Vectors.dense([8.9, 5.2, 7.8, 8.3, 5.2, 3.0]), 4.0,),
         (6, Vectors.dense([7.9, 8.5, 9.2, 4.0, 9.4, 2.1]), 4.0,)], ["id", "features", "label"])
 
-    selector = ANOVASelector(numTopFeatures=1, featuresCol="features",
-                             outputCol="selectedFeatures", labelCol="label")
+    selector = UnivariateFeatureSelector(featuresCol="features", outputCol="selectedFeatures",
+                                         labelCol="label", selectionMode="numTopFeatures")
+    selector.setFeatureType("continuous").setLabelType("categorical").setSelectionThreshold(1)
 
     result = selector.fit(df).transform(df)
 
-    print("ANOVASelector output with top %d features selected" % selector.getNumTopFeatures())
+    print("UnivariateFeatureSelector output with top %d features selected using f_classif"
+          % selector.getSelectionThreshold())
     result.show()
     # $example off$
 
diff --git a/examples/src/main/scala/org/apache/spark/examples/ml/ANOVATestExample.scala b/examples/src/main/scala/org/apache/spark/examples/ml/ANOVATestExample.scala
deleted file mode 100644
index f0b9f23..0000000
--- a/examples/src/main/scala/org/apache/spark/examples/ml/ANOVATestExample.scala
+++ /dev/null
@@ -1,63 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-// scalastyle:off println
-package org.apache.spark.examples.ml
-
-// $example on$
-import org.apache.spark.ml.linalg.{Vector, Vectors}
-import org.apache.spark.ml.stat.ANOVATest
-// $example off$
-import org.apache.spark.sql.SparkSession
-
-/**
- * An example for ANOVA testing.
- * Run with
- * {{{
- * bin/run-example ml.ANOVATestExample
- * }}}
- */
-object ANOVATestExample {
-
-  def main(args: Array[String]): Unit = {
-    val spark = SparkSession
-      .builder
-      .appName("ANOVATestExample")
-      .getOrCreate()
-    import spark.implicits._
-
-    // $example on$
-    val data = Seq(
-      (3.0, Vectors.dense(1.7, 4.4, 7.6, 5.8, 9.6, 2.3)),
-      (2.0, Vectors.dense(8.8, 7.3, 5.7, 7.3, 2.2, 4.1)),
-      (3.0, Vectors.dense(1.2, 9.5, 2.5, 3.1, 8.7, 2.5)),
-      (2.0, Vectors.dense(3.7, 9.2, 6.1, 4.1, 7.5, 3.8)),
-      (4.0, Vectors.dense(8.9, 5.2, 7.8, 8.3, 5.2, 3.0)),
-      (4.0, Vectors.dense(7.9, 8.5, 9.2, 4.0, 9.4, 2.1))
-    )
-
-    val df = data.toDF("label", "features")
-    val anova = ANOVATest.test(df, "features", "label").head
-    println(s"pValues = ${anova.getAs[Vector](0)}")
-    println(s"degreesOfFreedom ${anova.getSeq[Int](1).mkString("[", ",", "]")}")
-    println(s"fValues ${anova.getAs[Vector](2)}")
-    // $example off$
-
-    spark.stop()
-  }
-}
-// scalastyle:on println
diff --git a/examples/src/main/scala/org/apache/spark/examples/ml/FValueSelectorExample.scala b/examples/src/main/scala/org/apache/spark/examples/ml/FValueSelectorExample.scala
deleted file mode 100644
index 914d81b..0000000
--- a/examples/src/main/scala/org/apache/spark/examples/ml/FValueSelectorExample.scala
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-// scalastyle:off println
-package org.apache.spark.examples.ml
-
-// $example on$
-import org.apache.spark.ml.feature.FValueSelector
-import org.apache.spark.ml.linalg.Vectors
-// $example off$
-import org.apache.spark.sql.SparkSession
-
-/**
- * An example for FValueSelector.
- * Run with
- * {{{
- * bin/run-example ml.FValueSelectorExample
- * }}}
- */
-object FValueSelectorExample {
-  def main(args: Array[String]): Unit = {
-    val spark = SparkSession
-      .builder
-      .appName("FValueSelectorExample")
-      .getOrCreate()
-    import spark.implicits._
-
-    // $example on$
-    val data = Seq(
-      (1, Vectors.dense(6.0, 7.0, 0.0, 7.0, 6.0, 0.0), 4.6),
-      (2, Vectors.dense(0.0, 9.0, 6.0, 0.0, 5.0, 9.0), 6.6),
-      (3, Vectors.dense(0.0, 9.0, 3.0, 0.0, 5.0, 5.0), 5.1),
-      (4, Vectors.dense(0.0, 9.0, 8.0, 5.0, 6.0, 4.0), 7.6),
-      (5, Vectors.dense(8.0, 9.0, 6.0, 5.0, 4.0, 4.0), 9.0),
-      (6, Vectors.dense(8.0, 9.0, 6.0, 4.0, 0.0, 0.0), 9.0)
-    )
-
-    val df = spark.createDataset(data).toDF("id", "features", "label")
-
-    val selector = new FValueSelector()
-      .setNumTopFeatures(1)
-      .setFeaturesCol("features")
-      .setLabelCol("label")
-      .setOutputCol("selectedFeatures")
-
-    val result = selector.fit(df).transform(df)
-
-    println(s"FValueSelector output with top ${selector.getNumTopFeatures} features selected")
-    result.show()
-    // $example off$
-
-    spark.stop()
-  }
-}
-// scalastyle:on println
diff --git a/examples/src/main/scala/org/apache/spark/examples/ml/FValueTestExample.scala b/examples/src/main/scala/org/apache/spark/examples/ml/FValueTestExample.scala
deleted file mode 100644
index 08ec22c..0000000
--- a/examples/src/main/scala/org/apache/spark/examples/ml/FValueTestExample.scala
+++ /dev/null
@@ -1,63 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-// scalastyle:off println
-package org.apache.spark.examples.ml
-
-// $example on$
-import org.apache.spark.ml.linalg.{Vector, Vectors}
-import org.apache.spark.ml.stat.FValueTest
-// $example off$
-import org.apache.spark.sql.SparkSession
-
-/**
- * An example for FValue testing.
- * Run with
- * {{{
- * bin/run-example ml.FValueTestExample
- * }}}
- */
-object FValueTestExample {
-
-  def main(args: Array[String]): Unit = {
-    val spark = SparkSession
-      .builder
-      .appName("FValueTestExample")
-      .getOrCreate()
-    import spark.implicits._
-
-    // $example on$
-    val data = Seq(
-      (4.6, Vectors.dense(6.0, 7.0, 0.0, 7.0, 6.0, 0.0)),
-      (6.6, Vectors.dense(0.0, 9.0, 6.0, 0.0, 5.0, 9.0)),
-      (5.1, Vectors.dense(0.0, 9.0, 3.0, 0.0, 5.0, 5.0)),
-      (7.6, Vectors.dense(0.0, 9.0, 8.0, 5.0, 6.0, 4.0)),
-      (9.0, Vectors.dense(8.0, 9.0, 6.0, 5.0, 4.0, 4.0)),
-      (9.0, Vectors.dense(8.0, 9.0, 6.0, 4.0, 0.0, 0.0))
-    )
-
-    val df = data.toDF("label", "features")
-    val fValue = FValueTest.test(df, "features", "label").head
-    println(s"pValues ${fValue.getAs[Vector](0)}")
-    println(s"degreesOfFreedom ${fValue.getSeq[Int](1).mkString("[", ",", "]")}")
-    println(s"fValues ${fValue.getAs[Vector](2)}")
-    // $example off$
-
-    spark.stop()
-  }
-}
-// scalastyle:on println
diff --git a/examples/src/main/scala/org/apache/spark/examples/ml/ANOVASelectorExample.scala b/examples/src/main/scala/org/apache/spark/examples/ml/UnivariateFeatureSelectorExample.scala
similarity index 76%
rename from examples/src/main/scala/org/apache/spark/examples/ml/ANOVASelectorExample.scala
rename to examples/src/main/scala/org/apache/spark/examples/ml/UnivariateFeatureSelectorExample.scala
index 46803cc..e4932db 100644
--- a/examples/src/main/scala/org/apache/spark/examples/ml/ANOVASelectorExample.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/ml/UnivariateFeatureSelectorExample.scala
@@ -19,23 +19,23 @@
 package org.apache.spark.examples.ml
 
 // $example on$
-import org.apache.spark.ml.feature.ANOVASelector
+import org.apache.spark.ml.feature.UnivariateFeatureSelector
 import org.apache.spark.ml.linalg.Vectors
 // $example off$
 import org.apache.spark.sql.SparkSession
 
 /**
- * An example for ANOVASelector.
+ * An example for UnivariateFeatureSelector.
  * Run with
  * {{{
- * bin/run-example ml.ANOVASelectorExample
+ * bin/run-example ml.UnivariateFeatureSelectorExample
  * }}}
  */
-object ANOVASelectorExample {
+object UnivariateFeatureSelectorExample {
   def main(args: Array[String]): Unit = {
     val spark = SparkSession
       .builder
-      .appName("ANOVASelectorExample")
+      .appName("UnivariateFeatureSelectorExample")
       .getOrCreate()
     import spark.implicits._
 
@@ -51,15 +51,19 @@ object ANOVASelectorExample {
 
     val df = spark.createDataset(data).toDF("id", "features", "label")
 
-    val selector = new ANOVASelector()
-      .setNumTopFeatures(1)
+    val selector = new UnivariateFeatureSelector()
+      .setFeatureType("continuous")
+      .setLabelType("categorical")
+      .setSelectionMode("numTopFeatures")
+      .setSelectionThreshold(1)
       .setFeaturesCol("features")
       .setLabelCol("label")
       .setOutputCol("selectedFeatures")
 
     val result = selector.fit(df).transform(df)
 
-    println(s"ANOVASelector output with top ${selector.getNumTopFeatures} features selected")
+    println(s"UnivariateFeatureSelector output with top ${selector.getSelectionThreshold}" +
+      s" features selected using f_classif")
     result.show()
     // $example off$
 
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/ANOVASelector.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/ANOVASelector.scala
deleted file mode 100644
index 81ffd01..0000000
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/ANOVASelector.scala
+++ /dev/null
@@ -1,195 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.ml.feature
-
-import org.apache.hadoop.fs.Path
-
-import org.apache.spark.annotation.Since
-import org.apache.spark.ml.param._
-import org.apache.spark.ml.stat.ANOVATest
-import org.apache.spark.ml.util._
-import org.apache.spark.sql.{DataFrame, Dataset}
-
-
-/**
- * ANOVA F-value Classification selector, which selects continuous features to use for predicting a
- * categorical label.
- * The selector supports different selection methods: `numTopFeatures`, `percentile`, `fpr`,
- * `fdr`, `fwe`.
- *  - `numTopFeatures` chooses a fixed number of top features according to a F value classification
- *     test.
- *  - `percentile` is similar but chooses a fraction of all features instead of a fixed number.
- *  - `fpr` chooses all features whose p-value are below a threshold, thus controlling the false
- *    positive rate of selection.
- *  - `fdr` uses the [Benjamini-Hochberg procedure]
- *    (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
- *    to choose all features whose false discovery rate is below a threshold.
- *  - `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
- *    1/numFeatures, thus controlling the family-wise error rate of selection.
- * By default, the selection method is `numTopFeatures`, with the default number of top features
- * set to 50.
- */
-@Since("3.1.0")
-final class ANOVASelector @Since("3.1.0")(@Since("3.1.0") override val uid: String)
-  extends Selector[ANOVASelectorModel] {
-
-  @Since("3.1.0")
-  def this() = this(Identifiable.randomUID("ANOVASelector"))
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setNumTopFeatures(value: Int): this.type = super.setNumTopFeatures(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setPercentile(value: Double): this.type = super.setPercentile(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFpr(value: Double): this.type = super.setFpr(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFdr(value: Double): this.type = super.setFdr(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFwe(value: Double): this.type = super.setFwe(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setSelectorType(value: String): this.type = super.setSelectorType(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFeaturesCol(value: String): this.type = super.setFeaturesCol(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setOutputCol(value: String): this.type = super.setOutputCol(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setLabelCol(value: String): this.type = super.setLabelCol(value)
-
-  /**
-   * get the SelectionTestResult for every feature against the label
-   */
-  protected[this] override def getSelectionTestResult(df: DataFrame): DataFrame = {
-    ANOVATest.test(df, getFeaturesCol, getLabelCol, true)
-  }
-
-  /**
-   * Create a new instance of concrete SelectorModel.
-   * @param indices The indices of the selected features
-   * @return A new SelectorModel instance
-   */
-  protected[this] def createSelectorModel(
-      uid: String,
-      indices: Array[Int]): ANOVASelectorModel = {
-    new ANOVASelectorModel(uid, indices)
-  }
-
-  @Since("3.1.0")
-  override def fit(dataset: Dataset[_]): ANOVASelectorModel = {
-    super.fit(dataset)
-  }
-
-  @Since("3.1.0")
-  override def copy(extra: ParamMap): this.type = defaultCopy(extra)
-}
-
-@Since("3.1.0")
-object ANOVASelector extends DefaultParamsReadable[ANOVASelector] {
-
-  @Since("3.1.0")
-  override def load(path: String): ANOVASelector = super.load(path)
-}
-
-/**
- * Model fitted by [[ANOVASelector]].
- */
-@Since("3.1.0")
-class ANOVASelectorModel private[ml](
-    @Since("3.1.0") override val uid: String,
-    @Since("3.1.0") override val selectedFeatures: Array[Int])
-  extends SelectorModel[ANOVASelectorModel] (uid, selectedFeatures) {
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFeaturesCol(value: String): this.type = super.setFeaturesCol(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setOutputCol(value: String): this.type = super.setOutputCol(value)
-
-  @Since("3.1.0")
-  override def copy(extra: ParamMap): ANOVASelectorModel = {
-    val copied = new ANOVASelectorModel(uid, selectedFeatures)
-      .setParent(parent)
-    copyValues(copied, extra)
-  }
-
-  @Since("3.1.0")
-  override def write: MLWriter = new ANOVASelectorModel.ANOVASelectorModelWriter(this)
-
-  @Since("3.1.0")
-  override def toString: String = {
-    s"ANOVASelectorModel: uid=$uid, numSelectedFeatures=${selectedFeatures.length}"
-  }
-}
-
-@Since("3.1.0")
-object ANOVASelectorModel extends MLReadable[ANOVASelectorModel] {
-
-  @Since("3.1.0")
-  override def read: MLReader[ANOVASelectorModel] = new ANOVASelectorModelReader
-
-  @Since("3.1.0")
-  override def load(path: String): ANOVASelectorModel = super.load(path)
-
-  private[ANOVASelectorModel] class ANOVASelectorModelWriter(
-      instance: ANOVASelectorModel) extends MLWriter {
-
-    private case class Data(selectedFeatures: Seq[Int])
-
-    override protected def saveImpl(path: String): Unit = {
-      DefaultParamsWriter.saveMetadata(instance, path, sc)
-      val data = Data(instance.selectedFeatures.toSeq)
-      val dataPath = new Path(path, "data").toString
-      sparkSession.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath)
-    }
-  }
-
-  private class ANOVASelectorModelReader extends MLReader[ANOVASelectorModel] {
-
-    /** Checked against metadata when loading model */
-    private val className = classOf[ANOVASelectorModel].getName
-
-    override def load(path: String): ANOVASelectorModel = {
-      val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
-      val dataPath = new Path(path, "data").toString
-      val data = sparkSession.read.parquet(dataPath)
-        .select("selectedFeatures").head()
-      val selectedFeatures = data.getAs[Seq[Int]](0).toArray
-      val model = new ANOVASelectorModel(metadata.uid, selectedFeatures)
-      metadata.getAndSetParams(model)
-      model
-    }
-  }
-}
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
index 7f83b69..198a886 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
@@ -44,6 +44,7 @@ import org.apache.spark.sql.types.StructType
  * By default, the selection method is `numTopFeatures`, with the default number of top features
  * set to 50.
  */
+@deprecated("use UnivariateFeatureSelector instead", "3.1.0")
 @Since("1.6.0")
 final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: String)
   extends Selector[ChiSqSelectorModel] {
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/FValueSelector.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/FValueSelector.scala
deleted file mode 100644
index d177555..0000000
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/FValueSelector.scala
+++ /dev/null
@@ -1,195 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.ml.feature
-
-import org.apache.hadoop.fs.Path
-
-import org.apache.spark.annotation.Since
-import org.apache.spark.ml.param.ParamMap
-import org.apache.spark.ml.stat.FValueTest
-import org.apache.spark.ml.util._
-import org.apache.spark.sql.{DataFrame, Dataset}
-
-
-/**
- * F Value Regression feature selector, which selects continuous features to use for predicting a
- * continuous label.
- * The selector supports different selection methods: `numTopFeatures`, `percentile`, `fpr`,
- * `fdr`, `fwe`.
- *  - `numTopFeatures` chooses a fixed number of top features according to a F value regression
- *  test.
- *  - `percentile` is similar but chooses a fraction of all features instead of a fixed number.
- *  - `fpr` chooses all features whose p-value are below a threshold, thus controlling the false
- *    positive rate of selection.
- *  - `fdr` uses the [Benjamini-Hochberg procedure]
- *    (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
- *    to choose all features whose false discovery rate is below a threshold.
- *  - `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
- *    1/numFeatures, thus controlling the family-wise error rate of selection.
- * By default, the selection method is `numTopFeatures`, with the default number of top features
- * set to 50.
- */
-@Since("3.1.0")
-final class FValueSelector @Since("3.1.0") (@Since("3.1.0") override val uid: String) extends
-  Selector[FValueSelectorModel] {
-
-  @Since("3.1.0")
-  def this() = this(Identifiable.randomUID("FValueSelector"))
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setNumTopFeatures(value: Int): this.type = super.setNumTopFeatures(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setPercentile(value: Double): this.type = super.setPercentile(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFpr(value: Double): this.type = super.setFpr(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFdr(value: Double): this.type = super.setFdr(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFwe(value: Double): this.type = super.setFwe(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setSelectorType(value: String): this.type = super.setSelectorType(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFeaturesCol(value: String): this.type = super.setFeaturesCol(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setOutputCol(value: String): this.type = super.setOutputCol(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setLabelCol(value: String): this.type = super.setLabelCol(value)
-
-  /**
-   * get the SelectionTestResult for every feature against the label
-   */
-  protected[this] override def getSelectionTestResult(df: DataFrame): DataFrame = {
-    FValueTest.test(df, getFeaturesCol, getLabelCol, true)
-  }
-
-  /**
-   * Create a new instance of concrete SelectorModel.
-   * @param indices The indices of the selected features
-   * @return A new SelectorModel instance
-   */
-  protected[this] def createSelectorModel(
-      uid: String,
-      indices: Array[Int]): FValueSelectorModel = {
-    new FValueSelectorModel(uid, indices)
-  }
-
-  @Since("3.1.0")
-  override def fit(dataset: Dataset[_]): FValueSelectorModel = {
-    super.fit(dataset)
-  }
-
-  @Since("3.1.0")
-  override def copy(extra: ParamMap): this.type = defaultCopy(extra)
-}
-
-@Since("3.1.0")
-object FValueSelector extends DefaultParamsReadable[FValueSelector] {
-
-  @Since("3.1.0")
-  override def load(path: String): FValueSelector = super.load(path)
-}
-
-/**
- * Model fitted by [[FValueSelector]]
- */
-@Since("3.1.0")
-class FValueSelectorModel private[ml](
-    @Since("3.1.0") override val uid: String,
-    @Since("3.1.0") override val selectedFeatures: Array[Int])
-  extends SelectorModel[FValueSelectorModel] (uid, selectedFeatures) {
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setFeaturesCol(value: String): this.type = super.setFeaturesCol(value)
-
-  /** @group setParam */
-  @Since("3.1.0")
-  override def setOutputCol(value: String): this.type = super.setOutputCol(value)
-
-  @Since("3.1.0")
-  override def copy(extra: ParamMap): FValueSelectorModel = {
-    val copied = new FValueSelectorModel(uid, selectedFeatures)
-      .setParent(parent)
-    copyValues(copied, extra)
-  }
-
-  @Since("3.1.0")
-  override def write: MLWriter = new FValueSelectorModel.FValueSelectorModelWriter(this)
-
-  @Since("3.1.0")
-  override def toString: String = {
-    s"FValueSelectorModel: uid=$uid, numSelectedFeatures=${selectedFeatures.length}"
-  }
-}
-
-@Since("3.1.0")
-object FValueSelectorModel extends MLReadable[FValueSelectorModel] {
-
-  @Since("3.1.0")
-  override def read: MLReader[FValueSelectorModel] = new FValueSelectorModelReader
-
-  @Since("3.1.0")
-  override def load(path: String): FValueSelectorModel = super.load(path)
-
-  private[FValueSelectorModel] class FValueSelectorModelWriter(
-      instance: FValueSelectorModel) extends MLWriter {
-
-    private case class Data(selectedFeatures: Seq[Int])
-
-    override protected def saveImpl(path: String): Unit = {
-      DefaultParamsWriter.saveMetadata(instance, path, sc)
-      val data = Data(instance.selectedFeatures.toSeq)
-      val dataPath = new Path(path, "data").toString
-      sparkSession.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath)
-    }
-  }
-
-  private class FValueSelectorModelReader extends MLReader[FValueSelectorModel] {
-
-    /** Checked against metadata when loading model */
-    private val className = classOf[FValueSelectorModel].getName
-
-    override def load(path: String): FValueSelectorModel = {
-      val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
-      val dataPath = new Path(path, "data").toString
-      val data = sparkSession.read.parquet(dataPath)
-        .select("selectedFeatures").head()
-      val selectedFeatures = data.getAs[Seq[Int]](0).toArray
-      val model = new FValueSelectorModel(metadata.uid, selectedFeatures)
-      metadata.getAndSetParams(model)
-      model
-    }
-  }
-}
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala
index 41de26d..cb8b71a 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala
@@ -133,10 +133,6 @@ private[feature] trait SelectorParams extends Params
  * Super class for feature selectors.
  * 1. Chi-Square Selector
  * This feature selector is for categorical features and categorical labels.
- * 2. ANOVA F-value Classification Selector
- * This feature selector is for continuous features and categorical labels.
- * 3. Regression F-value Selector
- * This feature selector is for continuous features and continuous labels.
  * The selector supports different selection methods: `numTopFeatures`, `percentile`, `fpr`,
  * `fdr`, `fwe`.
  *  - `numTopFeatures` chooses a fixed number of top features according to a hypothesis.
@@ -279,11 +275,6 @@ private[ml] abstract class SelectorModel[T <: SelectorModel[T]] (
   extends Model[T] with SelectorParams with MLWritable {
   self: T =>
 
-  if (selectedFeatures.length >= 2) {
-    require(selectedFeatures.sliding(2).forall(l => l(0) < l(1)),
-      "Index should be strictly increasing.")
-  }
-
   /** @group setParam */
   @Since("3.1.0")
   def setFeaturesCol(value: String): this.type = set(featuresCol, value)
@@ -298,7 +289,8 @@ private[ml] abstract class SelectorModel[T <: SelectorModel[T]] (
   override def transform(dataset: Dataset[_]): DataFrame = {
     val outputSchema = transformSchema(dataset.schema, logging = true)
 
-    SelectorModel.transform(dataset, selectedFeatures, outputSchema, $(outputCol), $(featuresCol))
+    SelectorModel.transform(dataset, selectedFeatures.sorted, outputSchema,
+      $(outputCol), $(featuresCol))
   }
 
   @Since("3.1.0")
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
new file mode 100644
index 0000000..6d5f09e
--- /dev/null
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
@@ -0,0 +1,467 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.attribute.{Attribute, AttributeGroup, NominalAttribute, NumericAttribute}
+import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector, Vectors, VectorUDT}
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasLabelCol, HasOutputCol}
+import org.apache.spark.ml.stat.{ANOVATest, ChiSquareTest, FValueTest}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+import org.apache.spark.sql.functions.{col, udf}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * Params for [[UnivariateFeatureSelector]] and [[UnivariateFeatureSelectorModel]].
+ */
+private[feature] trait UnivariateFeatureSelectorParams extends Params
+  with HasFeaturesCol with HasLabelCol with HasOutputCol {
+
+  /**
+   * The feature type.
+   * Supported options: "categorical", "continuous"
+   * @group param
+   */
+  @Since("3.1.1")
+  final val featureType = new Param[String](this, "featureType",
+    "Feature type. Supported options: categorical, continuous.",
+    ParamValidators.inArray(Array("categorical", "continuous")))
+
+  /** @group getParam */
+  @Since("3.1.1")
+  def getFeatureType: String = $(featureType)
+
+  /**
+   * The label type.
+   * Supported options: "categorical", "continuous"
+   * @group param
+   */
+  @Since("3.1.1")
+  final val labelType = new Param[String](this, "labelType",
+    "Label type. Supported options: categorical, continuous.",
+    ParamValidators.inArray(Array("categorical", "continuous")))
+
+  /** @group getParam */
+  @Since("3.1.1")
+  def getLabelType: String = $(labelType)
+
+  /**
+   * The selection mode.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr", "fdr", "fwe"
+   * @group param
+   */
+  @Since("3.1.1")
+  final val selectionMode = new Param[String](this, "selectionMode",
+    "The selection mode. Supported options: numTopFeatures, percentile, fpr, fdr, fwe",
+    ParamValidators.inArray(Array("numTopFeatures", "percentile", "fpr", "fdr",
+      "fwe")))
+
+  /** @group getParam */
+  @Since("3.1.1")
+  def getSelectionMode: String = $(selectionMode)
+
+  /**
+   * The upper bound of the features that selector will select.
+   * @group param
+   */
+  @Since("3.1.1")
+  final val selectionThreshold = new DoubleParam(this, "selectionThreshold",
+    "The upper bound of the features that selector will select.")
+
+  /** @group getParam */
+  def getSelectionThreshold: Double = $(selectionThreshold)
+
+  setDefault(selectionMode -> "numTopFeatures")
+}
+
+/**
+ * The user can set `featureType` and labelType`, and Spark will pick the score function based on
+ * the specified `featureType` and labelType`.
+ * The following combination of `featureType` and `labelType` are supported:
+ *  - `featureType` `categorical` and `labelType` `categorical`:  Spark uses chi2.
+ *  - `featureType` `continuous` and `labelType` `categorical`:  Spark uses f_classif.
+ *  - `featureType` `continuous` and `labelType` `continuous`:  Spark uses f_regression.
+ *
+ * The `UnivariateFeatureSelector` supports different selection modes: `numTopFeatures`,
+ * `percentile`, `fpr`, `fdr`, `fwe`.
+ *  - `numTopFeatures` chooses a fixed number of top features according to a hypothesis.
+ *  - `percentile` is similar but chooses a fraction of all features instead of a fixed number.
+ *  - `fpr` chooses all features whose p-value are below a threshold, thus controlling the false
+ *    positive rate of selection.
+ *  - `fdr` uses the <a href=
+ *  "https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure">
+ *  Benjamini-Hochberg procedure</a>
+ *    to choose all features whose false discovery rate is below a threshold.
+ *  - `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
+ *    1/numFeatures, thus controlling the family-wise error rate of selection.
+ *
+ * By default, the selection mode is `numTopFeatures`.
+ */
+@Since("3.1.1")
+final class UnivariateFeatureSelector @Since("3.1.1")(@Since("3.1.1") override val uid: String)
+  extends Estimator[UnivariateFeatureSelectorModel] with UnivariateFeatureSelectorParams
+    with DefaultParamsWritable {
+
+  @Since("3.1.1")
+  def this() = this(Identifiable.randomUID("UnivariateFeatureSelector"))
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setSelectionMode(value: String): this.type = set(selectionMode, value)
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setSelectionThreshold(value: Double): this.type = set(selectionThreshold, value)
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setFeatureType(value: String): this.type = set(featureType, value)
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setLabelType(value: String): this.type = set(labelType, value)
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setOutputCol(value: String): this.type = set(outputCol, value)
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setLabelCol(value: String): this.type = set(labelCol, value)
+
+  @Since("3.1.1")
+  override def fit(dataset: Dataset[_]): UnivariateFeatureSelectorModel = {
+    transformSchema(dataset.schema, logging = true)
+    val numFeatures = MetadataUtils.getNumFeatures(dataset, $(featuresCol))
+
+    $(selectionMode) match {
+      case ("numTopFeatures") =>
+        if (!isSet(selectionThreshold)) {
+          set(selectionThreshold, 50.0)
+        } else {
+          require($(selectionThreshold) > 0 && $(selectionThreshold).toInt == $(selectionThreshold),
+            "selectionThreshold needs to be a positive Integer for selection mode numTopFeatures")
+        }
+      case ("percentile") =>
+        if (!isSet(selectionThreshold)) {
+          set(selectionThreshold, 0.1)
+        } else {
+          require($(selectionThreshold) >= 0 && $(selectionThreshold) <= 1,
+            "selectionThreshold needs to be in the range of 0 to 1 for selection mode percentile")
+        }
+      case ("fpr") =>
+        if (!isSet(selectionThreshold)) {
+          set(selectionThreshold, 0.05)
+        } else {
+          require($(selectionThreshold) >= 0 && $(selectionThreshold) <= 1,
+            "selectionThreshold needs to be in the range of 0 to 1 for selection mode fpr")
+        }
+      case ("fdr") =>
+        if (!isSet(selectionThreshold)) {
+          set(selectionThreshold, 0.05)
+        } else {
+          require($(selectionThreshold) >= 0 && $(selectionThreshold) <= 1,
+            "selectionThreshold needs to be in the range of 0 to 1 for selection mode fdr")
+        }
+      case ("fwe") =>
+        if (!isSet(selectionThreshold)) {
+          set(selectionThreshold, 0.05)
+        } else {
+          require($(selectionThreshold) >= 0 && $(selectionThreshold) <= 1,
+            "selectionThreshold needs to be in the range of 0 to 1 for selection mode fwe")
+        }
+      case _ =>
+        throw new IllegalArgumentException(s"Unsupported selection mode:" +
+          s" selectionMode=${$(selectionMode)}")
+    }
+
+    require(isSet(featureType) && isSet(labelType), "featureType and labelType need to be set")
+    val resultDF = ($(featureType), $(labelType)) match {
+      case ("categorical", "categorical") =>
+        ChiSquareTest.test(dataset.toDF, getFeaturesCol, getLabelCol, true)
+      case ("continuous", "categorical") =>
+        ANOVATest.test(dataset.toDF, getFeaturesCol, getLabelCol, true)
+      case ("continuous", "continuous") =>
+        FValueTest.test(dataset.toDF, getFeaturesCol, getLabelCol, true)
+      case _ =>
+        throw new IllegalArgumentException(s"Unsupported combination:" +
+          s" featureType=${$(featureType)}, labelType=${$(labelType)}")
+    }
+
+    val indices =
+      selectIndicesFromPValues(numFeatures, resultDF, $(selectionMode), $(selectionThreshold))
+
+    copyValues(new UnivariateFeatureSelectorModel(uid, indices)
+      .setParent(this))
+  }
+
+  def getTopIndices(df: DataFrame, k: Int): Array[Int] = {
+    val spark = SparkSession.builder().getOrCreate()
+    import spark.implicits._
+    df.sort("pValue", "featureIndex")
+      .select("featureIndex")
+      .limit(k)
+      .as[Int]
+      .collect()
+  }
+
+  def selectIndicesFromPValues(
+      numFeatures: Int,
+      resultDF: DataFrame,
+      selectionMode: String,
+      selectionThreshold: Double): Array[Int] = {
+    val spark = SparkSession.builder().getOrCreate()
+    import spark.implicits._
+    val indices = selectionMode match {
+      case "numTopFeatures" =>
+        getTopIndices(resultDF, selectionThreshold.toInt)
+      case "percentile" =>
+        getTopIndices(resultDF, (numFeatures * selectionThreshold).toInt)
+      case "fpr" =>
+        resultDF.select("featureIndex")
+          .where(col("pValue") < selectionThreshold)
+          .as[Int].collect()
+      case "fdr" =>
+        // This uses the Benjamini-Hochberg procedure.
+        // https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure
+        val f = selectionThreshold / numFeatures
+        val maxIndex = resultDF.sort("pValue", "featureIndex")
+          .select("pValue")
+          .as[Double].rdd
+          .zipWithIndex
+          .flatMap { case (pValue, index) =>
+            if (pValue <= f * (index + 1)) {
+              Iterator.single(index.toInt)
+            } else Iterator.empty
+          }.fold(-1)(math.max)
+        if (maxIndex >= 0) {
+          getTopIndices(resultDF, maxIndex + 1)
+        } else Array.emptyIntArray
+      case "fwe" =>
+        resultDF.select("featureIndex")
+          .where(col("pValue") < selectionThreshold / numFeatures)
+          .as[Int].collect()
+      case errorType =>
+        throw new IllegalArgumentException(s"Unknown Selector Type: $errorType")
+    }
+    indices
+  }
+
+  @Since("3.1.1")
+  override def transformSchema(schema: StructType): StructType = {
+    SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT)
+    SchemaUtils.checkNumericType(schema, $(labelCol))
+    SchemaUtils.appendColumn(schema, $(outputCol), new VectorUDT)
+  }
+
+  @Since("3.1.1")
+  override def copy(extra: ParamMap): UnivariateFeatureSelector = defaultCopy(extra)
+}
+
+@Since("3.1.1")
+object UnivariateFeatureSelector extends DefaultParamsReadable[UnivariateFeatureSelector] {
+
+  @Since("3.1.1")
+  override def load(path: String): UnivariateFeatureSelector = super.load(path)
+}
+
+/**
+ * Model fitted by [[UnivariateFeatureSelectorModel]].
+ */
+@Since("3.1.1")
+class UnivariateFeatureSelectorModel private[ml](
+    @Since("3.1.1") override val uid: String,
+    @Since("3.1.1") val selectedFeatures: Array[Int])
+  extends Model[UnivariateFeatureSelectorModel] with UnivariateFeatureSelectorParams
+    with MLWritable {
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
+
+  /** @group setParam */
+  @Since("3.1.1")
+  def setOutputCol(value: String): this.type = set(outputCol, value)
+
+  protected def isNumericAttribute = true
+
+  @Since("3.1.1")
+  override def transform(dataset: Dataset[_]): DataFrame = {
+    val outputSchema = transformSchema(dataset.schema, logging = true)
+
+    UnivariateFeatureSelectorModel
+      .transform(dataset, selectedFeatures.sorted, outputSchema, $(outputCol), $(featuresCol))
+  }
+
+  @Since("3.1.1")
+  override def transformSchema(schema: StructType): StructType = {
+    SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT)
+    val newField =
+      UnivariateFeatureSelectorModel
+        .prepOutputField(schema, selectedFeatures, $(outputCol), $(featuresCol), isNumericAttribute)
+    SchemaUtils.appendColumn(schema, newField)
+  }
+
+  @Since("3.1.1")
+  override def copy(extra: ParamMap): UnivariateFeatureSelectorModel = {
+    val copied = new UnivariateFeatureSelectorModel(uid, selectedFeatures)
+      .setParent(parent)
+    copyValues(copied, extra)
+  }
+
+  @Since("3.1.1")
+  override def write: MLWriter =
+    new UnivariateFeatureSelectorModel.UnivariateFeatureSelectorModelWriter(this)
+
+  @Since("3.1.1")
+  override def toString: String = {
+    s"UnivariateFeatureSelectorModel: uid=$uid, numSelectedFeatures=${selectedFeatures.length}"
+  }
+}
+
+@Since("3.1.1")
+object UnivariateFeatureSelectorModel extends MLReadable[UnivariateFeatureSelectorModel] {
+
+  @Since("3.1.1")
+  override def read: MLReader[UnivariateFeatureSelectorModel] =
+    new UnivariateFeatureSelectorModelReader
+
+  @Since("3.1.1")
+  override def load(path: String): UnivariateFeatureSelectorModel = super.load(path)
+
+  private[UnivariateFeatureSelectorModel] class UnivariateFeatureSelectorModelWriter(
+      instance: UnivariateFeatureSelectorModel) extends MLWriter {
+
+    private case class Data(selectedFeatures: Seq[Int])
+
+    override protected def saveImpl(path: String): Unit = {
+      DefaultParamsWriter.saveMetadata(instance, path, sc)
+      val data = Data(instance.selectedFeatures.toSeq)
+      val dataPath = new Path(path, "data").toString
+      sparkSession.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath)
+    }
+  }
+
+  private class UnivariateFeatureSelectorModelReader
+    extends MLReader[UnivariateFeatureSelectorModel] {
+
+    /** Checked against metadata when loading model */
+    private val className = classOf[UnivariateFeatureSelectorModel].getName
+
+    override def load(path: String): UnivariateFeatureSelectorModel = {
+      val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
+      val dataPath = new Path(path, "data").toString
+      val data = sparkSession.read.parquet(dataPath)
+        .select("selectedFeatures").head()
+      val selectedFeatures = data.getAs[Seq[Int]](0).toArray
+      val model = new UnivariateFeatureSelectorModel(metadata.uid, selectedFeatures)
+      metadata.getAndSetParams(model)
+      model
+    }
+  }
+
+  private def transform(
+      dataset: Dataset[_],
+      selectedFeatures: Array[Int],
+      outputSchema: StructType,
+      outputCol: String,
+      featuresCol: String): DataFrame = {
+    val newSize = selectedFeatures.length
+    val func = { vector: Vector =>
+      vector match {
+        case SparseVector(_, indices, values) =>
+          val (newIndices, newValues) =
+            compressSparse(indices, values, selectedFeatures)
+          Vectors.sparse(newSize, newIndices, newValues)
+        case DenseVector(values) =>
+          Vectors.dense(selectedFeatures.map(values))
+        case other =>
+          throw new UnsupportedOperationException(
+            s"Only sparse and dense vectors are supported but got ${other.getClass}.")
+      }
+    }
+
+    val transformer = udf(func)
+    dataset.withColumn(outputCol, transformer(col(featuresCol)),
+      outputSchema(outputCol).metadata)
+  }
+
+  /**
+   * Prepare the output column field, including per-feature metadata.
+   */
+  private def prepOutputField(
+      schema: StructType,
+      selectedFeatures: Array[Int],
+      outputCol: String,
+      featuresCol: String,
+      isNumericAttribute: Boolean): StructField = {
+    val selector = selectedFeatures.toSet
+    val origAttrGroup = AttributeGroup.fromStructField(schema(featuresCol))
+    val featureAttributes: Array[Attribute] = if (origAttrGroup.attributes.nonEmpty) {
+      origAttrGroup.attributes.get.zipWithIndex.filter(x => selector.contains(x._2)).map(_._1)
+    } else {
+      if (isNumericAttribute) {
+        Array.fill[Attribute](selector.size)(NumericAttribute.defaultAttr)
+      } else {
+        Array.fill[Attribute](selector.size)(NominalAttribute.defaultAttr)
+      }
+    }
+    val newAttributeGroup = new AttributeGroup(outputCol, featureAttributes)
+    newAttributeGroup.toStructField()
+  }
+
+  private def compressSparse(
+      indices: Array[Int],
+      values: Array[Double],
+      selectedFeatures: Array[Int]): (Array[Int], Array[Double]) = {
+    val newValues = new ArrayBuilder.ofDouble
+    val newIndices = new ArrayBuilder.ofInt
+    var i = 0
+    var j = 0
+    while (i < indices.length && j < selectedFeatures.length) {
+      val indicesIdx = indices(i)
+      val filterIndicesIdx = selectedFeatures(j)
+      if (indicesIdx == filterIndicesIdx) {
+        newIndices += j
+        newValues += values(i)
+        j += 1
+        i += 1
+      } else {
+        if (indicesIdx > filterIndicesIdx) {
+          j += 1
+        } else {
+          i += 1
+        }
+      }
+    }
+    // TODO: Sparse representation might be ineffective if (newSize ~= newValues.size)
+    (newIndices.result(), newValues.result())
+  }
+}
diff --git a/mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala b/mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala
index f14f63b..7a7e76c 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala
@@ -35,7 +35,7 @@ import org.apache.spark.util.collection.OpenHashMap
  * information on ANOVA test.
  */
 @Since("3.1.0")
-object ANOVATest {
+private[ml] object ANOVATest {
 
   /**
    * @param dataset  DataFrame of categorical labels and continuous features.
diff --git a/mllib/src/main/scala/org/apache/spark/ml/stat/FValueTest.scala b/mllib/src/main/scala/org/apache/spark/ml/stat/FValueTest.scala
index ad506ab..f315e92 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/stat/FValueTest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/stat/FValueTest.scala
@@ -30,7 +30,7 @@ import org.apache.spark.sql.functions._
  * FValue test for continuous data.
  */
 @Since("3.1.0")
-object FValueTest {
+private[ml] object FValueTest {
 
   /** Used to construct output schema of tests */
   private  case class FValueResult(
diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/ANOVASelectorSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/ANOVASelectorSuite.scala
deleted file mode 100644
index 0d664e4..0000000
--- a/mllib/src/test/scala/org/apache/spark/ml/feature/ANOVASelectorSuite.scala
+++ /dev/null
@@ -1,206 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.ml.feature
-
-import org.apache.spark.ml.linalg.{Vector, Vectors}
-import org.apache.spark.ml.param.ParamsSuite
-import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest, MLTestingUtils}
-import org.apache.spark.ml.util.TestingUtils._
-import org.apache.spark.sql.{Dataset, Row}
-
-class ANOVASelectorSuite extends MLTest with DefaultReadWriteTest {
-
-  import testImplicits._
-
-  @transient var dataset: Dataset[_] = _
-
-  override def beforeAll(): Unit = {
-    super.beforeAll()
-
-    // scalastyle:off
-    /*
-      X:
-      array([[4.65415496e-03, 1.03550567e-01, -1.17358140e+00,
-      1.61408773e-01,  3.92492111e-01,  7.31240882e-01],
-      [-9.01651741e-01, -5.28905302e-01,  1.27636785e+00,
-      7.02154563e-01,  6.21348351e-01,  1.88397353e-01],
-      [ 3.85692159e-01, -9.04639637e-01,  5.09782604e-02,
-      8.40043971e-01,  7.45977857e-01,  8.78402288e-01],
-      [ 1.36264353e+00,  2.62454094e-01,  7.96306202e-01,
-      6.14948000e-01,  7.44948187e-01,  9.74034830e-01],
-      [ 9.65874070e-01,  2.52773665e+00, -2.19380094e+00,
-      2.33408080e-01,  1.86340919e-01,  8.23390433e-01],
-      [ 1.12324305e+01, -2.77121515e-01,  1.12740513e-01,
-      2.35184013e-01,  3.46668895e-01,  9.38500782e-02],
-      [ 1.06195839e+01, -1.82891238e+00,  2.25085601e-01,
-      9.09979851e-01,  6.80257535e-02,  8.24017480e-01],
-      [ 1.12806837e+01,  1.30686889e+00,  9.32839108e-02,
-      3.49784755e-01,  1.71322408e-02,  7.48465194e-02],
-      [ 9.98689462e+00,  9.50808938e-01, -2.90786359e-01,
-      2.31253009e-01,  7.46270968e-01,  1.60308169e-01],
-      [ 1.08428551e+01, -1.02749936e+00,  1.73951508e-01,
-      8.92482744e-02,  1.42651730e-01,  7.66751625e-01],
-      [-1.98641448e+00,  1.12811990e+01, -2.35246756e-01,
-      8.22809049e-01,  3.26739456e-01,  7.88268404e-01],
-      [-6.09864090e-01,  1.07346276e+01, -2.18805509e-01,
-      7.33931213e-01,  1.42554396e-01,  7.11225605e-01],
-      [-1.58481268e+00,  9.19364039e+00, -5.87490459e-02,
-      2.51532056e-01,  2.82729807e-01,  7.16245686e-01],
-      [-2.50949277e-01,  1.12815254e+01, -6.94806734e-01,
-      5.93898886e-01,  5.68425656e-01,  8.49762330e-01],
-      [ 7.63485129e-01,  1.02605138e+01,  1.32617719e+00,
-      5.49682879e-01,  8.59931442e-01,  4.88677978e-02],
-      [ 9.34900015e-01,  4.11379043e-01,  8.65010205e+00,
-      9.23509168e-01,  1.16995043e-01,  5.91894106e-03],
-      [ 4.73734933e-01, -1.48321181e+00,  9.73349621e+00,
-      4.09421563e-01,  5.09375719e-01,  5.93157850e-01],
-      [ 3.41470679e-01, -6.88972582e-01,  9.60347938e+00,
-      3.62654055e-01,  2.43437468e-01,  7.13052838e-01],
-      [-5.29614251e-01, -1.39262856e+00,  1.01354144e+01,
-      8.24123861e-01,  5.84074506e-01,  6.54461558e-01],
-      [-2.99454508e-01,  2.20457263e+00,  1.14586015e+01,
-      5.16336729e-01,  9.99776159e-01,  3.15769738e-01]])
-      y:
-      array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4])
-      scikit-learn result:
-      >>> f_classif(X, y)
-      (array([228.27701422,  84.33070501, 134.25330675,   0.82211775, 0.82991363,   1.08478943]),
-       array([2.43864448e-13, 5.09088367e-10, 1.49033067e-11, 5.00596446e-01, 4.96684374e-01, 3.83798191e-01]))
-    */
-    // scalastyle:on
-
-    val data = Seq(
-      (1, Vectors.dense(4.65415496e-03, 1.03550567e-01, -1.17358140e+00,
-      1.61408773e-01, 3.92492111e-01, 7.31240882e-01), Vectors.dense(4.65415496e-03)),
-      (1, Vectors.dense(-9.01651741e-01, -5.28905302e-01, 1.27636785e+00,
-      7.02154563e-01, 6.21348351e-01, 1.88397353e-01), Vectors.dense(-9.01651741e-01)),
-      (1, Vectors.dense(3.85692159e-01, -9.04639637e-01, 5.09782604e-02,
-      8.40043971e-01, 7.45977857e-01, 8.78402288e-01), Vectors.dense(3.85692159e-01)),
-      (1, Vectors.dense(1.36264353e+00, 2.62454094e-01, 7.96306202e-01,
-      6.14948000e-01, 7.44948187e-01, 9.74034830e-01), Vectors.dense(1.36264353e+00)),
-      (1, Vectors.dense(9.65874070e-01, 2.52773665e+00, -2.19380094e+00,
-        2.33408080e-01, 1.86340919e-01, 8.23390433e-01), Vectors.dense(9.65874070e-01)),
-      (2, Vectors.dense(1.12324305e+01, -2.77121515e-01, 1.12740513e-01,
-        2.35184013e-01, 3.46668895e-01, 9.38500782e-02), Vectors.dense(1.12324305e+01)),
-      (2, Vectors.dense(1.06195839e+01, -1.82891238e+00, 2.25085601e-01,
-        9.09979851e-01, 6.80257535e-02, 8.24017480e-01), Vectors.dense(1.06195839e+01)),
-      (2, Vectors.dense(1.12806837e+01, 1.30686889e+00, 9.32839108e-02,
-        3.49784755e-01, 1.71322408e-02, 7.48465194e-02), Vectors.dense(1.12806837e+01)),
-      (2, Vectors.dense(9.98689462e+00, 9.50808938e-01, -2.90786359e-01,
-        2.31253009e-01, 7.46270968e-01, 1.60308169e-01), Vectors.dense(9.98689462e+00)),
-      (2, Vectors.dense(1.08428551e+01, -1.02749936e+00, 1.73951508e-01,
-        8.92482744e-02, 1.42651730e-01, 7.66751625e-01), Vectors.dense(1.08428551e+01)),
-      (3, Vectors.dense(-1.98641448e+00, 1.12811990e+01, -2.35246756e-01,
-        8.22809049e-01, 3.26739456e-01, 7.88268404e-01), Vectors.dense(-1.98641448e+00)),
-      (3, Vectors.dense(-6.09864090e-01, 1.07346276e+01, -2.18805509e-01,
-        7.33931213e-01, 1.42554396e-01, 7.11225605e-01), Vectors.dense(-6.09864090e-01)),
-      (3, Vectors.dense(-1.58481268e+00, 9.19364039e+00, -5.87490459e-02,
-        2.51532056e-01, 2.82729807e-01, 7.16245686e-01), Vectors.dense(-1.58481268e+00)),
-      (3, Vectors.dense(-2.50949277e-01, 1.12815254e+01, -6.94806734e-01,
-        5.93898886e-01, 5.68425656e-01, 8.49762330e-01), Vectors.dense(-2.50949277e-01)),
-      (3, Vectors.dense(7.63485129e-01, 1.02605138e+01, 1.32617719e+00,
-        5.49682879e-01, 8.59931442e-01, 4.88677978e-02), Vectors.dense(7.63485129e-01)),
-      (4, Vectors.dense(9.34900015e-01, 4.11379043e-01, 8.65010205e+00,
-        9.23509168e-01, 1.16995043e-01, 5.91894106e-03), Vectors.dense(9.34900015e-01)),
-      (4, Vectors.dense(4.73734933e-01, -1.48321181e+00, 9.73349621e+00,
-        4.09421563e-01, 5.09375719e-01, 5.93157850e-01), Vectors.dense(4.73734933e-01)),
-      (4, Vectors.dense(3.41470679e-01, -6.88972582e-01, 9.60347938e+00,
-        3.62654055e-01, 2.43437468e-01, 7.13052838e-01), Vectors.dense(3.41470679e-01)),
-      (4, Vectors.dense(-5.29614251e-01, -1.39262856e+00, 1.01354144e+01,
-        8.24123861e-01, 5.84074506e-01, 6.54461558e-01), Vectors.dense(-5.29614251e-01)),
-      (4, Vectors.dense(-2.99454508e-01, 2.20457263e+00, 1.14586015e+01,
-        5.16336729e-01, 9.99776159e-01, 3.15769738e-01), Vectors.dense(-2.99454508e-01)))
-
-    dataset = spark.createDataFrame(data).toDF("label", "features", "topFeature")
-  }
-
-  test("params") {
-    ParamsSuite.checkParams(new ANOVASelector())
-  }
-
-  test("Test ANOVAFValue classification selector: numTopFeatures") {
-    val selector = new ANOVASelector()
-      .setOutputCol("filtered").setSelectorType("numTopFeatures").setNumTopFeatures(1)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test ANOVAFValue classification selector: percentile") {
-    val selector = new ANOVASelector()
-      .setOutputCol("filtered").setSelectorType("percentile").setPercentile(0.17)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test ANOVAFValue classification selector: fpr") {
-    val selector = new ANOVASelector()
-      .setOutputCol("filtered").setSelectorType("fpr").setFpr(1.0E-12)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test ANOVAFValue classification selector: fdr") {
-    val selector = new ANOVASelector()
-      .setOutputCol("filtered").setSelectorType("fdr").setFdr(6.0E-12)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test ANOVAFValue classification selector: fwe") {
-    val selector = new ANOVASelector()
-      .setOutputCol("filtered").setSelectorType("fwe").setFwe(6.0E-12)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("read/write") {
-    def checkModelData(model: ANOVASelectorModel, model2: ANOVASelectorModel): Unit = {
-      assert(model.selectedFeatures === model2.selectedFeatures)
-    }
-    val anovaSelector = new ANOVASelector()
-    testEstimatorAndModelReadWrite(anovaSelector, dataset,
-      ANOVASelectorSuite.allParamSettings,
-      ANOVASelectorSuite.allParamSettings, checkModelData)
-  }
-
-  private def testSelector(selector: ANOVASelector, data: Dataset[_]):
-  ANOVASelectorModel = {
-    val selectorModel = selector.fit(data)
-    testTransformer[(Double, Vector, Vector)](data.toDF(), selectorModel,
-      "filtered", "topFeature") {
-      case Row(vec1: Vector, vec2: Vector) =>
-        assert(vec1 ~== vec2 absTol 1e-1)
-    }
-    selectorModel
-  }
-}
-
-object ANOVASelectorSuite {
-
-  /**
-   * Mapping from all Params to valid settings which differ from the defaults.
-   * This is useful for tests which need to exercise all Params, such as save/load.
-   * This excludes input columns to simplify some tests.
-   */
-  val allParamSettings: Map[String, Any] = Map(
-    "selectorType" -> "percentile",
-    "numTopFeatures" -> 1,
-    "percentile" -> 0.12,
-    "outputCol" -> "myOutput"
-  )
-}
diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/FValueSelectorSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/FValueSelectorSuite.scala
deleted file mode 100644
index 5c12001..0000000
--- a/mllib/src/test/scala/org/apache/spark/ml/feature/FValueSelectorSuite.scala
+++ /dev/null
@@ -1,238 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *    http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.ml.feature
-
-import org.apache.spark.ml.linalg.{Vector, Vectors}
-import org.apache.spark.ml.param.ParamsSuite
-import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest, MLTestingUtils}
-import org.apache.spark.ml.util.TestingUtils._
-import org.apache.spark.sql.{Dataset, Row}
-
-class FValueSelectorSuite extends MLTest with DefaultReadWriteTest {
-
-  import testImplicits._
-
-  @transient var dataset: Dataset[_] = _
-
-  override def beforeAll(): Unit = {
-    super.beforeAll()
-
-    // scalastyle:off
-    /*
-    Use the following sklearn data in this test
-
-    >>> from sklearn.feature_selection import f_regression
-    >>> import numpy as np
-    >>> np.random.seed(777)
-    >>> X = np.random.rand(20, 6)
-    >>> w = np.array([0.3, 0.4, 0.5, 0, 0, 0])
-    >>> y = X @ w
-    >>> X
-    array([[0.19151945, 0.62210877, 0.43772774, 0.78535858, 0.77997581,
-            0.27259261],
-           [0.27646426, 0.80187218, 0.95813935, 0.87593263, 0.35781727,
-            0.50099513],
-           [0.68346294, 0.71270203, 0.37025075, 0.56119619, 0.50308317,
-            0.01376845],
-           [0.77282662, 0.88264119, 0.36488598, 0.61539618, 0.07538124,
-            0.36882401],
-           [0.9331401 , 0.65137814, 0.39720258, 0.78873014, 0.31683612,
-            0.56809865],
-           [0.86912739, 0.43617342, 0.80214764, 0.14376682, 0.70426097,
-            0.70458131],
-           [0.21879211, 0.92486763, 0.44214076, 0.90931596, 0.05980922,
-            0.18428708],
-           [0.04735528, 0.67488094, 0.59462478, 0.53331016, 0.04332406,
-            0.56143308],
-           [0.32966845, 0.50296683, 0.11189432, 0.60719371, 0.56594464,
-            0.00676406],
-           [0.61744171, 0.91212289, 0.79052413, 0.99208147, 0.95880176,
-            0.79196414],
-           [0.28525096, 0.62491671, 0.4780938 , 0.19567518, 0.38231745,
-            0.05387369],
-           [0.45164841, 0.98200474, 0.1239427 , 0.1193809 , 0.73852306,
-            0.58730363],
-           [0.47163253, 0.10712682, 0.22921857, 0.89996519, 0.41675354,
-            0.53585166],
-           [0.00620852, 0.30064171, 0.43689317, 0.612149  , 0.91819808,
-            0.62573667],
-           [0.70599757, 0.14983372, 0.74606341, 0.83100699, 0.63372577,
-            0.43830988],
-           [0.15257277, 0.56840962, 0.52822428, 0.95142876, 0.48035918,
-            0.50255956],
-           [0.53687819, 0.81920207, 0.05711564, 0.66942174, 0.76711663,
-             0.70811536],
-           [0.79686718, 0.55776083, 0.96583653, 0.1471569 , 0.029647  ,
-            0.59389349],
-           [0.1140657 , 0.95080985, 0.32570741, 0.19361869, 0.45781165,
-            0.92040257],
-           [0.87906916, 0.25261576, 0.34800879, 0.18258873, 0.90179605,
-            0.70652816]])
-    >>> y
-    array([0.52516321, 0.88275782, 0.67524507, 0.76734745, 0.73909458,
-           0.83628141, 0.65665506, 0.58147135, 0.35603443, 0.94534373,
-           0.57458887, 0.59026777, 0.29894977, 0.34056582, 0.64476446,
-           0.53724782, 0.5173021 , 0.94508275, 0.57739736, 0.53877145])
-    >>> f_regression(X, y)
-    (array([5.58025504,  3.98311705, 20.59605518,  0.07993376,  1.25127646,
-            0.7676937 ]),
-    array([2.96302196e-02, 6.13173918e-02, 2.54580618e-04, 7.80612726e-01,
-    2.78015517e-01, 3.92474567e-01]))
-    */
-    // scalastyle:on
-
-    val data = Seq(
-      (0.52516321, Vectors.dense(0.19151945, 0.62210877, 0.43772774, 0.78535858, 0.77997581,
-        0.27259261), Vectors.dense(0.43772774)),
-      (0.88275782, Vectors.dense(0.27646426, 0.80187218, 0.95813935, 0.87593263, 0.35781727,
-        0.50099513), Vectors.dense(0.95813935)),
-      (0.67524507, Vectors.dense(0.68346294, 0.71270203, 0.37025075, 0.56119619, 0.50308317,
-        0.01376845), Vectors.dense(0.37025075)),
-      (0.76734745, Vectors.dense(0.77282662, 0.88264119, 0.36488598, 0.61539618, 0.07538124,
-        0.36882401), Vectors.dense(0.36488598)),
-      (0.73909458, Vectors.dense(0.9331401, 0.65137814, 0.39720258, 0.78873014, 0.31683612,
-        0.56809865), Vectors.dense(0.39720258)),
-
-      (0.83628141, Vectors.dense(0.86912739, 0.43617342, 0.80214764, 0.14376682, 0.70426097,
-        0.70458131), Vectors.dense(0.80214764)),
-      (0.65665506, Vectors.dense(0.21879211, 0.92486763, 0.44214076, 0.90931596, 0.05980922,
-        0.18428708), Vectors.dense(0.44214076)),
-      (0.58147135, Vectors.dense(0.04735528, 0.67488094, 0.59462478, 0.53331016, 0.04332406,
-        0.56143308), Vectors.dense(0.59462478)),
-      (0.35603443, Vectors.dense(0.32966845, 0.50296683, 0.11189432, 0.60719371, 0.56594464,
-        0.00676406), Vectors.dense(0.11189432)),
-      (0.94534373, Vectors.dense(0.61744171, 0.91212289, 0.79052413, 0.99208147, 0.95880176,
-        0.79196414), Vectors.dense(0.79052413)),
-
-      (0.57458887, Vectors.dense(0.28525096, 0.62491671, 0.4780938, 0.19567518, 0.38231745,
-        0.05387369), Vectors.dense(0.4780938)),
-      (0.59026777, Vectors.dense(0.45164841, 0.98200474, 0.1239427, 0.1193809, 0.73852306,
-        0.58730363), Vectors.dense(0.1239427)),
-      (0.29894977, Vectors.dense(0.47163253, 0.10712682, 0.22921857, 0.89996519, 0.41675354,
-        0.53585166), Vectors.dense(0.22921857)),
-      (0.34056582, Vectors.dense(0.00620852, 0.30064171, 0.43689317, 0.612149, 0.91819808,
-        0.62573667), Vectors.dense(0.43689317)),
-      (0.64476446, Vectors.dense(0.70599757, 0.14983372, 0.74606341, 0.83100699, 0.63372577,
-        0.43830988), Vectors.dense(0.74606341)),
-
-      (0.53724782, Vectors.dense(0.15257277, 0.56840962, 0.52822428, 0.95142876, 0.48035918,
-        0.50255956), Vectors.dense(0.52822428)),
-      (0.5173021, Vectors.dense(0.53687819, 0.81920207, 0.05711564, 0.66942174, 0.76711663,
-        0.70811536), Vectors.dense(0.05711564)),
-      (0.94508275, Vectors.dense(0.79686718, 0.55776083, 0.96583653, 0.1471569, 0.029647,
-        0.59389349), Vectors.dense(0.96583653)),
-      (0.57739736, Vectors.dense(0.1140657, 0.95080985, 0.96583653, 0.19361869, 0.45781165,
-        0.92040257), Vectors.dense(0.96583653)),
-      (0.53877145, Vectors.dense(0.87906916, 0.25261576, 0.34800879, 0.18258873, 0.90179605,
-        0.70652816), Vectors.dense(0.34800879)))
-
-    dataset = spark.createDataFrame(data).toDF("label", "features", "topFeature")
-  }
-
-  test("params") {
-    ParamsSuite.checkParams(new FValueSelector)
-  }
-
-  test("Test FValue selector: numTopFeatures") {
-    val selector = new FValueSelector()
-      .setOutputCol("filtered").setSelectorType("numTopFeatures").setNumTopFeatures(1)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test F Value selector: percentile") {
-    val selector = new FValueSelector()
-      .setOutputCol("filtered").setSelectorType("percentile").setPercentile(0.17)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test F Value selector: fpr") {
-    val selector = new FValueSelector()
-      .setOutputCol("filtered").setSelectorType("fpr").setFpr(0.01)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test F Value selector: fdr") {
-    val selector = new FValueSelector()
-      .setOutputCol("filtered").setSelectorType("fdr").setFdr(0.03)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test F Value selector: fwe") {
-    val selector = new FValueSelector()
-      .setOutputCol("filtered").setSelectorType("fwe").setFwe(0.03)
-    val model = testSelector(selector, dataset)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("Test FValue selector with sparse vector") {
-    val df = spark.createDataFrame(Seq(
-      (4.6, Vectors.sparse(6, Array((0, 6.0), (1, 7.0), (3, 7.0), (4, 6.0))), Vectors.dense(0.0)),
-      (6.6, Vectors.sparse(6, Array((1, 9.0), (2, 6.0), (4, 5.0), (5, 9.0))), Vectors.dense(6.0)),
-      (5.1, Vectors.sparse(6, Array((1, 9.0), (2, 3.0), (4, 5.0), (5, 5.0))), Vectors.dense(3.0)),
-      (7.6, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0)), Vectors.dense(8.0)),
-      (9.0, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0)), Vectors.dense(6.0)),
-      (9.0, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0)), Vectors.dense(6.0))
-    )).toDF("label", "features", "topFeature")
-
-    val selector = new FValueSelector()
-      .setOutputCol("filtered").setSelectorType("numTopFeatures").setNumTopFeatures(1)
-    val model = testSelector(selector, df)
-    MLTestingUtils.checkCopyAndUids(selector, model)
-  }
-
-  test("read/write") {
-    def checkModelData(model: FValueSelectorModel, model2:
-      FValueSelectorModel): Unit = {
-      assert(model.selectedFeatures === model2.selectedFeatures)
-    }
-    val fSelector = new FValueSelector
-    testEstimatorAndModelReadWrite(fSelector, dataset,
-      FValueSelectorSuite.allParamSettings,
-      FValueSelectorSuite.allParamSettings, checkModelData)
-  }
-
-  private def testSelector(selector: FValueSelector, data: Dataset[_]):
-      FValueSelectorModel = {
-    val selectorModel = selector.fit(data)
-    testTransformer[(Double, Vector, Vector)](data.toDF(), selectorModel,
-      "filtered", "topFeature") {
-      case Row(vec1: Vector, vec2: Vector) =>
-        assert(vec1 ~== vec2 absTol 1e-6)
-    }
-    selectorModel
-  }
-}
-
-object FValueSelectorSuite {
-
-  /**
-   * Mapping from all Params to valid settings which differ from the defaults.
-   * This is useful for tests which need to exercise all Params, such as save/load.
-   * This excludes input columns to simplify some tests.
-   */
-  val allParamSettings: Map[String, Any] = Map(
-    "selectorType" -> "percentile",
-    "numTopFeatures" -> 1,
-    "percentile" -> 0.12,
-    "outputCol" -> "myOutput"
-  )
-}
diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/UnivariateFeatureSelectorSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/UnivariateFeatureSelectorSuite.scala
new file mode 100644
index 0000000..84868dc
--- /dev/null
+++ b/mllib/src/test/scala/org/apache/spark/ml/feature/UnivariateFeatureSelectorSuite.scala
@@ -0,0 +1,685 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.stat.{ANOVATest, FValueTest}
+import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest, MLTestingUtils}
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.sql.{Dataset, Row}
+
+class UnivariateFeatureSelectorSuite extends MLTest with DefaultReadWriteTest {
+
+  import testImplicits._
+
+  @transient var datasetChi2: Dataset[_] = _
+  @transient var datasetAnova: Dataset[_] = _
+  @transient var datasetFRegression: Dataset[_] = _
+
+  private var selector1: UnivariateFeatureSelector = _
+  private var selector2: UnivariateFeatureSelector = _
+  private var selector3: UnivariateFeatureSelector = _
+
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    // Toy dataset, including the top feature for a chi-squared test.
+    // These data are chosen such that each feature's test has a distinct p-value.
+    /*
+     *  Contingency tables
+     *  feature1 = {6.0, 0.0, 8.0}
+     *  class  0 1 2
+     *    6.0||1|0|0|
+     *    0.0||0|3|0|
+     *    8.0||0|0|2|
+     *  degree of freedom = 4, statistic = 12, pValue = 0.017
+     *
+     *  feature2 = {7.0, 9.0}
+     *  class  0 1 2
+     *    7.0||1|0|0|
+     *    9.0||0|3|2|
+     *  degree of freedom = 2, statistic = 6, pValue = 0.049
+     *
+     *  feature3 = {0.0, 6.0, 3.0, 8.0}
+     *  class  0 1 2
+     *    0.0||1|0|0|
+     *    6.0||0|1|2|
+     *    3.0||0|1|0|
+     *    8.0||0|1|0|
+     *  degree of freedom = 6, statistic = 8.66, pValue = 0.193
+     *
+     *  feature4 = {7.0, 0.0, 5.0, 4.0}
+     *  class  0 1 2
+     *    7.0||1|0|0|
+     *    0.0||0|2|0|
+     *    5.0||0|1|1|
+     *    4.0||0|0|1|
+     *  degree of freedom = 6, statistic = 9.5, pValue = 0.147
+     *
+     *  feature5 = {6.0, 5.0, 4.0, 0.0}
+     *  class  0 1 2
+     *    6.0||1|1|0|
+     *    5.0||0|2|0|
+     *    4.0||0|0|1|
+     *    0.0||0|0|1|
+     *  degree of freedom = 6, statistic = 8.0, pValue = 0.238
+     *
+     *  feature6 = {0.0, 9.0, 5.0, 4.0}
+     *  class  0 1 2
+     *    0.0||1|0|1|
+     *    9.0||0|1|0|
+     *    5.0||0|1|0|
+     *    4.0||0|1|1|
+     *  degree of freedom = 6, statistic = 5, pValue = 0.54
+     *
+     *  To verify the results with R, run:
+     *  library(stats)
+     *  x1 <- c(6.0, 0.0, 0.0, 0.0, 8.0, 8.0)
+     *  x2 <- c(7.0, 9.0, 9.0, 9.0, 9.0, 9.0)
+     *  x3 <- c(0.0, 6.0, 3.0, 8.0, 6.0, 6.0)
+     *  x4 <- c(7.0, 0.0, 0.0, 5.0, 5.0, 4.0)
+     *  x5 <- c(6.0, 5.0, 5.0, 6.0, 4.0, 0.0)
+     *  x6 <- c(0.0, 9.0, 5.0, 4.0, 4.0, 0.0)
+     *  y <- c(0.0, 1.0, 1.0, 1.0, 2.0, 2.0)
+     *  chisq.test(x1,y)
+     *  chisq.test(x2,y)
+     *  chisq.test(x3,y)
+     *  chisq.test(x4,y)
+     *  chisq.test(x5,y)
+     *  chisq.test(x6,y)
+     */
+
+    datasetChi2 = spark.createDataFrame(Seq(
+      (0.0, Vectors.sparse(6, Array((0, 6.0), (1, 7.0), (3, 7.0), (4, 6.0))), Vectors.dense(6.0)),
+      (1.0, Vectors.sparse(6, Array((1, 9.0), (2, 6.0), (4, 5.0), (5, 9.0))), Vectors.dense(0.0)),
+      (1.0, Vectors.sparse(6, Array((1, 9.0), (2, 3.0), (4, 5.0), (5, 5.0))), Vectors.dense(0.0)),
+      (1.0, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0)), Vectors.dense(0.0)),
+      (2.0, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0)), Vectors.dense(8.0)),
+      (2.0, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0)), Vectors.dense(8.0))
+    )).toDF("label", "features", "topFeature")
+
+    // scalastyle:off
+    /*
+      X:
+      array([[4.65415496e-03, 1.03550567e-01, -1.17358140e+00,
+      1.61408773e-01,  3.92492111e-01,  7.31240882e-01],
+      [-9.01651741e-01, -5.28905302e-01,  1.27636785e+00,
+      7.02154563e-01,  6.21348351e-01,  1.88397353e-01],
+      [ 3.85692159e-01, -9.04639637e-01,  5.09782604e-02,
+      8.40043971e-01,  7.45977857e-01,  8.78402288e-01],
+      [ 1.36264353e+00,  2.62454094e-01,  7.96306202e-01,
+      6.14948000e-01,  7.44948187e-01,  9.74034830e-01],
+      [ 9.65874070e-01,  2.52773665e+00, -2.19380094e+00,
+      2.33408080e-01,  1.86340919e-01,  8.23390433e-01],
+      [ 1.12324305e+01, -2.77121515e-01,  1.12740513e-01,
+      2.35184013e-01,  3.46668895e-01,  9.38500782e-02],
+      [ 1.06195839e+01, -1.82891238e+00,  2.25085601e-01,
+      9.09979851e-01,  6.80257535e-02,  8.24017480e-01],
+      [ 1.12806837e+01,  1.30686889e+00,  9.32839108e-02,
+      3.49784755e-01,  1.71322408e-02,  7.48465194e-02],
+      [ 9.98689462e+00,  9.50808938e-01, -2.90786359e-01,
+      2.31253009e-01,  7.46270968e-01,  1.60308169e-01],
+      [ 1.08428551e+01, -1.02749936e+00,  1.73951508e-01,
+      8.92482744e-02,  1.42651730e-01,  7.66751625e-01],
+      [-1.98641448e+00,  1.12811990e+01, -2.35246756e-01,
+      8.22809049e-01,  3.26739456e-01,  7.88268404e-01],
+      [-6.09864090e-01,  1.07346276e+01, -2.18805509e-01,
+      7.33931213e-01,  1.42554396e-01,  7.11225605e-01],
+      [-1.58481268e+00,  9.19364039e+00, -5.87490459e-02,
+      2.51532056e-01,  2.82729807e-01,  7.16245686e-01],
+      [-2.50949277e-01,  1.12815254e+01, -6.94806734e-01,
+      5.93898886e-01,  5.68425656e-01,  8.49762330e-01],
+      [ 7.63485129e-01,  1.02605138e+01,  1.32617719e+00,
+      5.49682879e-01,  8.59931442e-01,  4.88677978e-02],
+      [ 9.34900015e-01,  4.11379043e-01,  8.65010205e+00,
+      9.23509168e-01,  1.16995043e-01,  5.91894106e-03],
+      [ 4.73734933e-01, -1.48321181e+00,  9.73349621e+00,
+      4.09421563e-01,  5.09375719e-01,  5.93157850e-01],
+      [ 3.41470679e-01, -6.88972582e-01,  9.60347938e+00,
+      3.62654055e-01,  2.43437468e-01,  7.13052838e-01],
+      [-5.29614251e-01, -1.39262856e+00,  1.01354144e+01,
+      8.24123861e-01,  5.84074506e-01,  6.54461558e-01],
+      [-2.99454508e-01,  2.20457263e+00,  1.14586015e+01,
+      5.16336729e-01,  9.99776159e-01,  3.15769738e-01]])
+      y:
+      array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4])
+      scikit-learn result:
+      >>> f_classif(X, y)
+      (array([228.27701422,  84.33070501, 134.25330675,   0.82211775, 0.82991363,   1.08478943]),
+       array([2.43864448e-13, 5.09088367e-10, 1.49033067e-11, 5.00596446e-01, 4.96684374e-01, 3.83798191e-01]))
+    */
+    // scalastyle:on
+
+    val dataAnova = Seq(
+      (1, Vectors.dense(4.65415496e-03, 1.03550567e-01, -1.17358140e+00,
+      1.61408773e-01, 3.92492111e-01, 7.31240882e-01), Vectors.dense(4.65415496e-03)),
+      (1, Vectors.dense(-9.01651741e-01, -5.28905302e-01, 1.27636785e+00,
+      7.02154563e-01, 6.21348351e-01, 1.88397353e-01), Vectors.dense(-9.01651741e-01)),
+      (1, Vectors.dense(3.85692159e-01, -9.04639637e-01, 5.09782604e-02,
+      8.40043971e-01, 7.45977857e-01, 8.78402288e-01), Vectors.dense(3.85692159e-01)),
+      (1, Vectors.dense(1.36264353e+00, 2.62454094e-01, 7.96306202e-01,
+      6.14948000e-01, 7.44948187e-01, 9.74034830e-01), Vectors.dense(1.36264353e+00)),
+      (1, Vectors.dense(9.65874070e-01, 2.52773665e+00, -2.19380094e+00,
+        2.33408080e-01, 1.86340919e-01, 8.23390433e-01), Vectors.dense(9.65874070e-01)),
+      (2, Vectors.dense(1.12324305e+01, -2.77121515e-01, 1.12740513e-01,
+        2.35184013e-01, 3.46668895e-01, 9.38500782e-02), Vectors.dense(1.12324305e+01)),
+      (2, Vectors.dense(1.06195839e+01, -1.82891238e+00, 2.25085601e-01,
+        9.09979851e-01, 6.80257535e-02, 8.24017480e-01), Vectors.dense(1.06195839e+01)),
+      (2, Vectors.dense(1.12806837e+01, 1.30686889e+00, 9.32839108e-02,
+        3.49784755e-01, 1.71322408e-02, 7.48465194e-02), Vectors.dense(1.12806837e+01)),
+      (2, Vectors.dense(9.98689462e+00, 9.50808938e-01, -2.90786359e-01,
+        2.31253009e-01, 7.46270968e-01, 1.60308169e-01), Vectors.dense(9.98689462e+00)),
+      (2, Vectors.dense(1.08428551e+01, -1.02749936e+00, 1.73951508e-01,
+        8.92482744e-02, 1.42651730e-01, 7.66751625e-01), Vectors.dense(1.08428551e+01)),
+      (3, Vectors.dense(-1.98641448e+00, 1.12811990e+01, -2.35246756e-01,
+        8.22809049e-01, 3.26739456e-01, 7.88268404e-01), Vectors.dense(-1.98641448e+00)),
+      (3, Vectors.dense(-6.09864090e-01, 1.07346276e+01, -2.18805509e-01,
+        7.33931213e-01, 1.42554396e-01, 7.11225605e-01), Vectors.dense(-6.09864090e-01)),
+      (3, Vectors.dense(-1.58481268e+00, 9.19364039e+00, -5.87490459e-02,
+        2.51532056e-01, 2.82729807e-01, 7.16245686e-01), Vectors.dense(-1.58481268e+00)),
+      (3, Vectors.dense(-2.50949277e-01, 1.12815254e+01, -6.94806734e-01,
+        5.93898886e-01, 5.68425656e-01, 8.49762330e-01), Vectors.dense(-2.50949277e-01)),
+      (3, Vectors.dense(7.63485129e-01, 1.02605138e+01, 1.32617719e+00,
+        5.49682879e-01, 8.59931442e-01, 4.88677978e-02), Vectors.dense(7.63485129e-01)),
+      (4, Vectors.dense(9.34900015e-01, 4.11379043e-01, 8.65010205e+00,
+        9.23509168e-01, 1.16995043e-01, 5.91894106e-03), Vectors.dense(9.34900015e-01)),
+      (4, Vectors.dense(4.73734933e-01, -1.48321181e+00, 9.73349621e+00,
+        4.09421563e-01, 5.09375719e-01, 5.93157850e-01), Vectors.dense(4.73734933e-01)),
+      (4, Vectors.dense(3.41470679e-01, -6.88972582e-01, 9.60347938e+00,
+        3.62654055e-01, 2.43437468e-01, 7.13052838e-01), Vectors.dense(3.41470679e-01)),
+      (4, Vectors.dense(-5.29614251e-01, -1.39262856e+00, 1.01354144e+01,
+        8.24123861e-01, 5.84074506e-01, 6.54461558e-01), Vectors.dense(-5.29614251e-01)),
+      (4, Vectors.dense(-2.99454508e-01, 2.20457263e+00, 1.14586015e+01,
+        5.16336729e-01, 9.99776159e-01, 3.15769738e-01), Vectors.dense(-2.99454508e-01)))
+
+    datasetAnova = spark.createDataFrame(dataAnova).toDF("label", "features", "topFeature")
+
+    // scalastyle:off
+    /*
+    Use the following sklearn data in this test
+
+    >>> from sklearn.feature_selection import f_regression
+    >>> import numpy as np
+    >>> np.random.seed(777)
+    >>> X = np.random.rand(20, 6)
+    >>> w = np.array([0.3, 0.4, 0.5, 0, 0, 0])
+    >>> y = X @ w
+    >>> X
+    array([[0.19151945, 0.62210877, 0.43772774, 0.78535858, 0.77997581,
+            0.27259261],
+           [0.27646426, 0.80187218, 0.95813935, 0.87593263, 0.35781727,
+            0.50099513],
+           [0.68346294, 0.71270203, 0.37025075, 0.56119619, 0.50308317,
+            0.01376845],
+           [0.77282662, 0.88264119, 0.36488598, 0.61539618, 0.07538124,
+            0.36882401],
+           [0.9331401 , 0.65137814, 0.39720258, 0.78873014, 0.31683612,
+            0.56809865],
+           [0.86912739, 0.43617342, 0.80214764, 0.14376682, 0.70426097,
+            0.70458131],
+           [0.21879211, 0.92486763, 0.44214076, 0.90931596, 0.05980922,
+            0.18428708],
+           [0.04735528, 0.67488094, 0.59462478, 0.53331016, 0.04332406,
+            0.56143308],
+           [0.32966845, 0.50296683, 0.11189432, 0.60719371, 0.56594464,
+            0.00676406],
+           [0.61744171, 0.91212289, 0.79052413, 0.99208147, 0.95880176,
+            0.79196414],
+           [0.28525096, 0.62491671, 0.4780938 , 0.19567518, 0.38231745,
+            0.05387369],
+           [0.45164841, 0.98200474, 0.1239427 , 0.1193809 , 0.73852306,
+            0.58730363],
+           [0.47163253, 0.10712682, 0.22921857, 0.89996519, 0.41675354,
+            0.53585166],
+           [0.00620852, 0.30064171, 0.43689317, 0.612149  , 0.91819808,
+            0.62573667],
+           [0.70599757, 0.14983372, 0.74606341, 0.83100699, 0.63372577,
+            0.43830988],
+           [0.15257277, 0.56840962, 0.52822428, 0.95142876, 0.48035918,
+            0.50255956],
+           [0.53687819, 0.81920207, 0.05711564, 0.66942174, 0.76711663,
+             0.70811536],
+           [0.79686718, 0.55776083, 0.96583653, 0.1471569 , 0.029647  ,
+            0.59389349],
+           [0.1140657 , 0.95080985, 0.32570741, 0.19361869, 0.45781165,
+            0.92040257],
+           [0.87906916, 0.25261576, 0.34800879, 0.18258873, 0.90179605,
+            0.70652816]])
+    >>> y
+    array([0.52516321, 0.88275782, 0.67524507, 0.76734745, 0.73909458,
+           0.83628141, 0.65665506, 0.58147135, 0.35603443, 0.94534373,
+           0.57458887, 0.59026777, 0.29894977, 0.34056582, 0.64476446,
+           0.53724782, 0.5173021 , 0.94508275, 0.57739736, 0.53877145])
+    >>> f_regression(X, y)
+    (array([5.58025504,  3.98311705, 20.59605518,  0.07993376,  1.25127646,
+            0.7676937 ]),
+    array([2.96302196e-02, 6.13173918e-02, 2.54580618e-04, 7.80612726e-01,
+    2.78015517e-01, 3.92474567e-01]))
+    */
+    // scalastyle:on
+
+    val dataFRegression = Seq(
+      (0.52516321, Vectors.dense(0.19151945, 0.62210877, 0.43772774, 0.78535858, 0.77997581,
+        0.27259261), Vectors.dense(0.43772774)),
+      (0.88275782, Vectors.dense(0.27646426, 0.80187218, 0.95813935, 0.87593263, 0.35781727,
+        0.50099513), Vectors.dense(0.95813935)),
+      (0.67524507, Vectors.dense(0.68346294, 0.71270203, 0.37025075, 0.56119619, 0.50308317,
+        0.01376845), Vectors.dense(0.37025075)),
+      (0.76734745, Vectors.dense(0.77282662, 0.88264119, 0.36488598, 0.61539618, 0.07538124,
+        0.36882401), Vectors.dense(0.36488598)),
+      (0.73909458, Vectors.dense(0.9331401, 0.65137814, 0.39720258, 0.78873014, 0.31683612,
+        0.56809865), Vectors.dense(0.39720258)),
+
+      (0.83628141, Vectors.dense(0.86912739, 0.43617342, 0.80214764, 0.14376682, 0.70426097,
+        0.70458131), Vectors.dense(0.80214764)),
+      (0.65665506, Vectors.dense(0.21879211, 0.92486763, 0.44214076, 0.90931596, 0.05980922,
+        0.18428708), Vectors.dense(0.44214076)),
+      (0.58147135, Vectors.dense(0.04735528, 0.67488094, 0.59462478, 0.53331016, 0.04332406,
+        0.56143308), Vectors.dense(0.59462478)),
+      (0.35603443, Vectors.dense(0.32966845, 0.50296683, 0.11189432, 0.60719371, 0.56594464,
+        0.00676406), Vectors.dense(0.11189432)),
+      (0.94534373, Vectors.dense(0.61744171, 0.91212289, 0.79052413, 0.99208147, 0.95880176,
+        0.79196414), Vectors.dense(0.79052413)),
+
+      (0.57458887, Vectors.dense(0.28525096, 0.62491671, 0.4780938, 0.19567518, 0.38231745,
+        0.05387369), Vectors.dense(0.4780938)),
+      (0.59026777, Vectors.dense(0.45164841, 0.98200474, 0.1239427, 0.1193809, 0.73852306,
+        0.58730363), Vectors.dense(0.1239427)),
+      (0.29894977, Vectors.dense(0.47163253, 0.10712682, 0.22921857, 0.89996519, 0.41675354,
+        0.53585166), Vectors.dense(0.22921857)),
+      (0.34056582, Vectors.dense(0.00620852, 0.30064171, 0.43689317, 0.612149, 0.91819808,
+        0.62573667), Vectors.dense(0.43689317)),
+      (0.64476446, Vectors.dense(0.70599757, 0.14983372, 0.74606341, 0.83100699, 0.63372577,
+        0.43830988), Vectors.dense(0.74606341)),
+
+      (0.53724782, Vectors.dense(0.15257277, 0.56840962, 0.52822428, 0.95142876, 0.48035918,
+        0.50255956), Vectors.dense(0.52822428)),
+      (0.5173021, Vectors.dense(0.53687819, 0.81920207, 0.05711564, 0.66942174, 0.76711663,
+        0.70811536), Vectors.dense(0.05711564)),
+      (0.94508275, Vectors.dense(0.79686718, 0.55776083, 0.96583653, 0.1471569, 0.029647,
+        0.59389349), Vectors.dense(0.96583653)),
+      (0.57739736, Vectors.dense(0.1140657, 0.95080985, 0.96583653, 0.19361869, 0.45781165,
+        0.92040257), Vectors.dense(0.96583653)),
+      (0.53877145, Vectors.dense(0.87906916, 0.25261576, 0.34800879, 0.18258873, 0.90179605,
+        0.70652816), Vectors.dense(0.34800879)))
+
+    datasetFRegression = spark.createDataFrame(dataFRegression)
+      .toDF("label", "features", "topFeature")
+
+    selector1 = new UnivariateFeatureSelector()
+      .setOutputCol("filtered")
+      .setFeatureType("continuous")
+      .setLabelType("categorical")
+    selector2 = new UnivariateFeatureSelector()
+      .setOutputCol("filtered")
+      .setFeatureType("continuous")
+      .setLabelType("continuous")
+    selector3 = new UnivariateFeatureSelector()
+      .setOutputCol("filtered")
+      .setFeatureType("categorical")
+      .setLabelType("categorical")
+  }
+
+  test("params") {
+    ParamsSuite.checkParams(new UnivariateFeatureSelector())
+  }
+
+  test("Test numTopFeatures") {
+    val testParams: Seq[(UnivariateFeatureSelector, Dataset[_])] = Seq(
+      (selector1.setSelectionMode("numTopFeatures").setSelectionThreshold(1), datasetAnova),
+      (selector2.setSelectionMode("numTopFeatures").setSelectionThreshold(1), datasetFRegression),
+      (selector3.setSelectionMode("numTopFeatures").setSelectionThreshold(1), datasetChi2)
+    )
+    for ((sel, dataset) <- testParams) {
+      val model = testSelector(sel, dataset)
+      MLTestingUtils.checkCopyAndUids(sel, model)
+    }
+  }
+
+  test("Test percentile") {
+    val testParams: Seq[(UnivariateFeatureSelector, Dataset[_])] = Seq(
+      (selector1.setSelectionMode("percentile").setSelectionThreshold(0.17), datasetAnova),
+      (selector2.setSelectionMode("percentile").setSelectionThreshold(0.17), datasetFRegression),
+      (selector3.setSelectionMode("percentile").setSelectionThreshold(0.17), datasetChi2)
+    )
+    for ((sel, dataset) <- testParams) {
+      val model = testSelector(sel, dataset)
+      MLTestingUtils.checkCopyAndUids(sel, model)
+    }
+  }
+
+  test("Test fpr") {
+    val testParams: Seq[(UnivariateFeatureSelector, Dataset[_])] = Seq(
+      (selector1.setSelectionMode("fpr").setSelectionThreshold(1.0E-12), datasetAnova),
+      (selector2.setSelectionMode("fpr").setSelectionThreshold(0.01), datasetFRegression),
+      (selector3.setSelectionMode("fpr").setSelectionThreshold(0.02), datasetChi2)
+    )
+    for ((sel, dataset) <- testParams) {
+      val model = testSelector(sel, dataset)
+      MLTestingUtils.checkCopyAndUids(sel, model)
+    }
+  }
+
+  test("Test fdr") {
+    val testParams: Seq[(UnivariateFeatureSelector, Dataset[_])] = Seq(
+      (selector1.setSelectionMode("fdr").setSelectionThreshold(6.0E-12), datasetAnova),
+      (selector2.setSelectionMode("fdr").setSelectionThreshold(0.03), datasetFRegression),
+      (selector3.setSelectionMode("fdr").setSelectionThreshold(0.12), datasetChi2)
+    )
+    for ((sel, dataset) <- testParams) {
+      val model = testSelector(sel, dataset)
+      MLTestingUtils.checkCopyAndUids(sel, model)
+    }
+  }
+
+  test("Test fwe") {
+    val testParams: Seq[(UnivariateFeatureSelector, Dataset[_])] = Seq(
+      (selector1.setSelectionMode("fwe").setSelectionThreshold(6.0E-12), datasetAnova),
+      (selector2.setSelectionMode("fwe").setSelectionThreshold(0.03), datasetFRegression),
+      (selector3.setSelectionMode("fwe").setSelectionThreshold(0.12), datasetChi2)
+    )
+    for ((sel, dataset) <- testParams) {
+      val model = testSelector(sel, dataset)
+      MLTestingUtils.checkCopyAndUids(sel, model)
+    }
+  }
+
+  // use the following sklean program to verify the test
+  // scalastyle:off
+  /*
+  import numpy as np
+    from sklearn.feature_selection import SelectFdr, f_classif
+
+  X = np.random.rand(10, 6)
+  w = np.array([5, 5, 0.0, 0, 0, 0]).reshape((-1, 1))
+  y = np.rint(0.1 * (X @ w)).flatten()
+  print(X)
+  print(y)
+
+  F, p = f_classif(X, y)
+  print('F', F)
+  print('p', p)
+  selected = SelectFdr(f_classif, alpha=0.25).fit(X, y).get_support(True)
+
+  print(selected)
+  */
+
+  /*
+  sklearn result
+  [[0.92166066 0.82295823 0.31276624 0.63069973 0.64679537 0.94138368]
+  [0.47027783 0.74907889 0.43660557 0.93212582 0.5654378  0.531748  ]
+  [0.67771108 0.23926502 0.66906295 0.73117095 0.67340005 0.52864934]
+  [0.84565144 0.28050298 0.94137135 0.42479664 0.21600724 0.98956871]
+  [0.58818255 0.32223507 0.13727654 0.80948059 0.94617741 0.48460179]
+  [0.59528639 0.75838511 0.98648654 0.65561948 0.83818237 0.30178127]
+  [0.00264811 0.46492597 0.71428557 0.94708987 0.54587827 0.9484639 ]
+  [0.94604186 0.43187098 0.42135172 0.77256283 0.44334613 0.1514674 ]
+  [0.45694004 0.00273459 0.14580367 0.74278963 0.57819284 0.99413419]
+  [0.02256925 0.56136702 0.0629738  0.64130602 0.01536191 0.56638321]]
+  [1. 1. 0. 1. 0. 1. 0. 1. 0. 0.]
+  F [5.66456136e+00 4.08120006e+00 1.85418412e+00 8.67095392e-01
+  2.87769237e-03 3.66010633e-01]
+  p [0.04454332 0.07803464 0.21040406 0.37900428 0.95853411 0.56195058]
+  [0 1]
+
+  [[0.27976711 0.48397753 0.18451698 0.59844137 0.01459805 0.98895542]
+  [0.97192726 0.46737333 0.08048093 0.38253056 0.04776121 0.55949538]
+  [0.62559834 0.44102192 0.19199043 0.959706   0.5332824  0.78621594]
+  [0.91649448 0.76501992 0.58678528 0.75239909 0.33179368 0.00893317]
+  [0.14086806 0.21876364 0.31767297 0.53061653 0.02786653 0.20021944]
+  [0.15214833 0.03028593 0.12326784 0.55663152 0.8333684  0.76923807]
+  [0.88178287 0.8492688  0.29417221 0.98122401 0.44103191 0.32709781]
+  [0.06686689 0.05834763 0.41316273 0.92850555 0.77308549 0.2931857 ]
+  [0.94747449 0.78336777 0.76096282 0.52368192 0.64814324 0.60455684]
+  [0.83382261 0.31412713 0.62490246 0.43896432 0.35390503 0.02316754]]
+  [0. 1. 1. 1. 0. 0. 1. 0. 1. 1.]
+  F [9.22227201e+01 8.36710241e+00 1.22217112e+00 1.63526175e-02
+  8.91954821e-03 6.44534477e-01]
+  p [1.14739663e-05 2.01189199e-02 3.01070031e-01 9.01402125e-01
+  9.27079623e-01 4.45267639e-01]
+  [0 1]
+  */
+  // scalastyle:on
+  test("Test selectIndicesFromPValues f_classif") {
+    val data_f_classif1 = Seq(
+      (1, Vectors.dense(0.92166066, 0.82295823, 0.31276624, 0.63069973, 0.64679537, 0.94138368),
+        Vectors.dense(0.92166066, 0.82295823)),
+      (1, Vectors.dense(0.47027783, 0.74907889, 0.43660557, 0.93212582, 0.5654378, 0.531748),
+        Vectors.dense(0.47027783, 0.74907889)),
+      (0, Vectors.dense(0.67771108, 0.23926502, 0.66906295, 0.73117095, 0.67340005, 0.52864934),
+        Vectors.dense(0.67771108, 0.23926502)),
+      (1, Vectors.dense(0.84565144, 0.28050298, 0.94137135, 0.42479664, 0.21600724, 0.98956871),
+        Vectors.dense(0.84565144, 0.28050298)),
+      (0, Vectors.dense(0.58818255, 0.32223507, 0.13727654, 0.80948059, 0.94617741, 0.48460179),
+        Vectors.dense(0.58818255, 0.32223507)),
+      (1, Vectors.dense(0.59528639, 0.75838511, 0.98648654, 0.65561948, 0.83818237, 0.30178127),
+        Vectors.dense(0.59528639, 0.75838511)),
+      (0, Vectors.dense(0.00264811, 0.46492597, 0.71428557, 0.94708987, 0.54587827, 0.9484639),
+        Vectors.dense(0.00264811, 0.46492597)),
+      (1, Vectors.dense(0.94604186, 0.43187098, 0.42135172, 0.77256283, 0.44334613, 0.1514674),
+        Vectors.dense(0.94604186, 0.43187098)),
+      (0, Vectors.dense(0.45694004, 0.00273459, 0.14580367, 0.74278963, 0.57819284, 0.99413419),
+        Vectors.dense(0.45694004, 0.00273459)),
+      (0, Vectors.dense(0.02256925, 0.56136702, 0.0629738, 0.64130602, 0.01536191, 0.56638321),
+        Vectors.dense(0.02256925, 0.56136702)))
+
+    val data_f_classif2 = Seq(
+      (0, Vectors.dense(0.27976711, 0.48397753, 0.18451698, 0.59844137, 0.01459805, 0.98895542),
+        Vectors.dense(0.27976711, 0.48397753)),
+      (1, Vectors.dense(0.97192726, 0.46737333, 0.08048093, 0.38253056, 0.04776121, 0.55949538),
+        Vectors.dense(0.97192726, 0.46737333)),
+      (1, Vectors.dense(0.62559834, 0.44102192, 0.19199043, 0.959706, 0.5332824, 0.78621594),
+        Vectors.dense(0.62559834, 0.44102192)),
+      (1, Vectors.dense(0.91649448, 0.76501992, 0.58678528, 0.75239909, 0.33179368, 0.00893317),
+        Vectors.dense(0.91649448, 0.76501992)),
+      (0, Vectors.dense(0.14086806, 0.21876364, 0.31767297, 0.53061653, 0.02786653, 0.20021944),
+        Vectors.dense(0.14086806, 0.21876364)),
+      (0, Vectors.dense(0.15214833, 0.03028593, 0.12326784, 0.55663152, 0.8333684, 0.76923807),
+        Vectors.dense(0.15214833, 0.03028593)),
+      (1, Vectors.dense(0.88178287, 0.8492688, 0.29417221, 0.98122401, 0.44103191, 0.32709781),
+        Vectors.dense(0.88178287, 0.8492688)),
+      (0, Vectors.dense(0.06686689, 0.05834763, 0.41316273, 0.92850555, 0.77308549, 0.2931857),
+        Vectors.dense(0.06686689, 0.05834763)),
+      (1, Vectors.dense(0.94747449, 0.78336777, 0.76096282, 0.52368192, 0.64814324, 0.60455684),
+        Vectors.dense(0.94747449, 0.78336777)),
+      (1, Vectors.dense(0.83382261, 0.31412713, 0.62490246, 0.43896432, 0.35390503, 0.02316754),
+        Vectors.dense(0.83382261, 0.31412713)))
+
+    val dataset_f_classification1 =
+      spark.createDataFrame(data_f_classif1).toDF("label", "features", "topFeature")
+
+    val dataset_f_classification2 =
+      spark.createDataFrame(data_f_classif2).toDF("label", "features", "topFeature")
+
+    val resultDF1 = ANOVATest.test(dataset_f_classification1.toDF, "features", "label", true)
+    val resultDF2 = ANOVATest.test(dataset_f_classification2.toDF, "features", "label", true)
+    val selector = new UnivariateFeatureSelector()
+      .setOutputCol("filtered")
+      .setFeatureType("continuous")
+      .setLabelType("categorical")
+    val indices1 = selector.selectIndicesFromPValues(6, resultDF1, "fdr", 0.25)
+    val indices2 = selector.selectIndicesFromPValues(6, resultDF2, "fdr", 0.25)
+    assert(indices1(0) === 0 && indices1(1) === 1)
+    assert(indices2(0) === 0 && indices1(1) === 1)
+  }
+
+  // use the following sklean program to verify the test
+  // scalastyle:off
+  /* import numpy as np
+    from sklearn.feature_selection import SelectFdr, f_regression
+
+  X = np.random.rand(10, 6)
+  w = np.array([5, 5, 0.0, 0, 0, 0]).reshape((-1, 1))
+  y = (X @ w).flatten()
+  print(X)
+  print(y)
+
+  F, p = f_regression(X, y)
+  print('F', F)
+  print('p', p)
+  selected = SelectFdr(f_regression, alpha=0.1).fit(X, y).get_support(True)
+
+  print(selected) */
+
+  /* sklean result
+  [[5.19537247e-01 4.53144603e-01 2.10190418e-01 9.76237361e-01
+  9.05792824e-01 9.34081024e-01]
+  [8.68906163e-01 5.49099467e-01 6.73567960e-01 3.94736897e-01
+  9.98764158e-01 1.14285918e-01]
+  [2.56211244e-01 5.21857152e-01 6.55000402e-01 4.81092256e-01
+  4.05802734e-02 1.59811005e-01]
+  [9.03076723e-01 1.80316576e-01 8.13131160e-01 6.92327901e-01
+  4.77693321e-01 2.17284784e-01]
+  [4.75926597e-01 6.80511651e-01 9.55843875e-01 1.52627108e-01
+  1.72766587e-01 6.45234673e-01]
+  [6.05829005e-01 8.43879811e-01 4.48596383e-01 7.25003439e-01
+  2.83962640e-02 5.14414827e-01]
+  [8.57631869e-01 1.18279868e-01 2.84428492e-01 8.51544596e-01
+  1.33220409e-02 1.87044251e-01]
+  [2.43360773e-01 4.83288948e-02 1.10430569e-01 4.33097852e-01
+  5.63452248e-02 8.24333214e-01]
+  [2.18226531e-01 5.28477779e-01 3.01852956e-01 6.31664822e-04
+  8.97463990e-01 8.25297034e-01]
+  [6.95170305e-01 7.35775299e-01 4.32188618e-01 2.26744166e-01
+  5.13186095e-01 2.91635657e-01]]
+  [4.86340925 7.09002815 3.89034198 5.4169665  5.78219124 7.24854408
+  4.87955868 1.45844834 3.73352155 7.15472802]
+  F [6.79932587 7.09311449 2.25262252 0.02652918 0.40812054 2.14464201]
+  p [0.03124895 0.02865887 0.17178184 0.87465381 0.54077957 0.18122753]
+  [0 1]
+  */
+
+  /* SKLearn result
+  [[0.21557113 0.66070242 0.89964323 0.1569332  0.84097522 0.61614986]
+  [0.14790391 0.40356507 0.2973803  0.53051143 0.35408457 0.88180598]
+  [0.39333276 0.42790148 0.41415147 0.82478069 0.57201431 0.49972278]
+  [0.46189165 0.460305   0.21054573 0.16588781 0.72898672 0.41290627]
+  [0.42527082 0.83902909 0.97275171 0.76947383 0.24470714 0.57847281]
+  [0.56185556 0.94463811 0.97741409 0.27233834 0.76460529 0.53085766]
+  [0.5828694  0.45827703 0.49305311 0.13803643 0.18242319 0.14182515]
+  [0.98848811 0.43453809 0.11712213 0.4849829  0.06431555 0.76125387]
+  [0.1181108  0.43820753 0.49576967 0.75729578 0.35355208 0.48165022]
+  [0.44250624 0.24310088 0.03976366 0.24023351 0.91659502 0.75260252]]
+  [4.38136774 2.7573449  4.10617119 4.61098326 6.32149954 7.53246836
+  5.20573215 7.11513098 2.78159163 3.42803558]
+  F [11.90962327  6.49595546  1.51054886  0.17751367  0.40829523  0.1797005 ]
+  p [0.0086816  0.03424301 0.25397764 0.68461076 0.54069506 0.68279904]
+  [0] */
+  // scalastyle:on
+  test("Test selectIndicesFromPValues f_regression") {
+    val data_f_regression1 = Seq(
+      (4.86340925, Vectors.dense(5.19537247e-01, 4.53144603e-01, 2.10190418e-01, 9.76237361e-01,
+        9.05792824e-01, 9.34081024e-01), Vectors.dense(5.19537247e-01, 4.53144603e-01)),
+      (7.09002815, Vectors.dense(8.68906163e-01, 5.49099467e-01, 6.73567960e-01, 3.94736897e-01,
+        9.98764158e-01, 1.14285918e-01), Vectors.dense(8.68906163e-01, 5.49099467e-01)),
+      (3.89034198, Vectors.dense(2.56211244e-01, 5.21857152e-01, 6.55000402e-01, 4.81092256e-01,
+        4.05802734e-02, 1.59811005e-01), Vectors.dense(2.56211244e-01, 5.21857152e-01)),
+      (5.4169665, Vectors.dense(9.03076723e-01, 1.80316576e-01, 8.13131160e-01, 6.92327901e-01,
+        4.77693321e-01, 2.17284784e-01), Vectors.dense(9.03076723e-01, 1.80316576e-01)),
+      (5.78219124, Vectors.dense(4.75926597e-01, 6.80511651e-01, 9.55843875e-01, 1.52627108e-01,
+        1.72766587e-01, 6.45234673e-01), Vectors.dense(4.75926597e-01, 6.80511651e-01)),
+      (7.24854408, Vectors.dense(6.05829005e-01, 8.43879811e-01, 4.48596383e-01, 7.25003439e-01,
+        2.83962640e-02, 5.14414827e-01), Vectors.dense(6.05829005e-01, 8.43879811e-01)),
+      (4.87955868, Vectors.dense(8.57631869e-01, 1.18279868e-01, 2.84428492e-01, 8.51544596e-01,
+        1.33220409e-02, 1.87044251e-01), Vectors.dense(8.57631869e-01, 1.18279868e-01)),
+      (1.45844834, Vectors.dense(2.43360773e-01, 4.83288948e-02, 1.10430569e-01, 4.33097852e-01,
+        5.63452248e-02, 8.24333214e-01), Vectors.dense(2.43360773e-01, 4.83288948e-02)),
+      (3.73352155, Vectors.dense(2.18226531e-01, 5.28477779e-01, 3.01852956e-01, 6.31664822e-04,
+        8.97463990e-01, 8.25297034e-01), Vectors.dense(2.18226531e-01, 5.28477779e-01)),
+      (7.15472802, Vectors.dense(6.95170305e-01, 7.35775299e-01, 4.32188618e-01, 2.26744166e-01,
+        5.13186095e-01, 2.91635657e-01), Vectors.dense(6.95170305e-01, 7.35775299e-01)))
+
+    val data_f_regression2 = Seq(
+      (4.38136774, Vectors.dense(0.21557113, 0.66070242, 0.89964323, 0.1569332, 0.84097522,
+        0.61614986), Vectors.dense(0.21557113)),
+      (2.7573449, Vectors.dense(0.14790391, 0.40356507, 0.2973803, 0.53051143, 0.35408457,
+        0.88180598), Vectors.dense(0.14790391)),
+      (4.10617119, Vectors.dense(0.39333276, 0.42790148, 0.41415147, 0.82478069, 0.57201431,
+        0.49972278), Vectors.dense(0.39333276)),
+      (4.61098326, Vectors.dense(0.46189165, 0.460305, 0.21054573, 0.16588781, 0.72898672,
+        0.41290627), Vectors.dense(0.46189165)),
+      (6.32149954, Vectors.dense(0.42527082, 0.83902909, 0.97275171, 0.76947383, 0.24470714,
+        0.57847281), Vectors.dense(0.42527082)),
+      (7.53246836, Vectors.dense(0.56185556, 0.94463811, 0.97741409, 0.27233834, 0.76460529,
+        0.53085766), Vectors.dense(0.56185556)),
+      (5.20573215, Vectors.dense(0.5828694, 0.45827703, 0.49305311, 0.13803643, 0.18242319,
+        0.14182515), Vectors.dense(0.5828694)),
+      (7.11513098, Vectors.dense(0.98848811, 0.43453809, 0.11712213, 0.4849829, 0.06431555,
+        0.76125387), Vectors.dense(0.98848811)),
+      (2.78159163, Vectors.dense(0.1181108, 0.43820753, 0.49576967, 0.75729578, 0.35355208,
+        0.48165022), Vectors.dense(0.1181108)),
+      (3.42803558, Vectors.dense(0.44250624, 0.24310088, 0.03976366, 0.24023351, 0.91659502,
+        0.75260252), Vectors.dense(0.44250624)))
+
+    val dataset_f_regression1 =
+      spark.createDataFrame(data_f_regression1).toDF("label", "features", "topFeature")
+
+    val dataset_f_regression2 =
+      spark.createDataFrame(data_f_regression2).toDF("label", "features", "topFeature")
+
+    val resultDF1 = FValueTest.test(dataset_f_regression1.toDF, "features", "label", true)
+    val resultDF2 = FValueTest.test(dataset_f_regression2.toDF, "features", "label", true)
+    val selector = new UnivariateFeatureSelector()
+      .setOutputCol("filtered")
+      .setFeatureType("continuous")
+      .setLabelType("continuous")
+    val indices1 = selector.selectIndicesFromPValues(6, resultDF1, "fdr", 0.1)
+    val indices2 = selector.selectIndicesFromPValues(6, resultDF2, "fdr", 0.1)
+    assert(indices1(0) === 1 && indices1(1) === 0)
+    assert(indices2(0) === 0)
+  }
+
+  test("read/write") {
+    def checkModelData(
+        model: UnivariateFeatureSelectorModel,
+        model2: UnivariateFeatureSelectorModel): Unit = {
+      assert(model.selectedFeatures === model2.selectedFeatures)
+    }
+    val selector = new UnivariateFeatureSelector()
+      .setFeatureType("continuous")
+      .setLabelType("categorical")
+    testEstimatorAndModelReadWrite(selector, datasetAnova,
+      UnivariateFeatureSelectorSuite.allParamSettings,
+      UnivariateFeatureSelectorSuite.allParamSettings, checkModelData)
+  }
+
+  private def testSelector(selector: UnivariateFeatureSelector, data: Dataset[_]):
+  UnivariateFeatureSelectorModel = {
+    val selectorModel = selector.fit(data)
+    testTransformer[(Double, Vector, Vector)](data.toDF(), selectorModel,
+      "filtered", "topFeature") {
+      case Row(vec1: Vector, vec2: Vector) =>
+        assert(vec1 ~== vec2 absTol 1e-1)
+    }
+    selectorModel
+  }
+}
+
+object UnivariateFeatureSelectorSuite {
+
+  /**
+   * Mapping from all Params to valid settings which differ from the defaults.
+   * This is useful for tests which need to exercise all Params, such as save/load.
+   * This excludes input columns to simplify some tests.
+   */
+  val allParamSettings: Map[String, Any] = Map(
+    "selectionMode" -> "percentile",
+    "selectionThreshold" -> 0.12,
+    "outputCol" -> "myOutput"
+  )
+}
diff --git a/python/docs/source/reference/pyspark.ml.rst b/python/docs/source/reference/pyspark.ml.rst
index cc90459..7837d60 100644
--- a/python/docs/source/reference/pyspark.ml.rst
+++ b/python/docs/source/reference/pyspark.ml.rst
@@ -61,8 +61,6 @@ Feature
     :template: autosummary/class_with_docs.rst
     :toctree: api/
 
-    ANOVASelector
-    ANOVASelectorModel
     Binarizer
     BucketedRandomProjectionLSH
     BucketedRandomProjectionLSHModel
@@ -74,8 +72,6 @@ Feature
     DCT
     ElementwiseProduct
     FeatureHasher
-    FValueSelector
-    FValueSelectorModel
     HashingTF
     IDF
     IDFModel
@@ -109,6 +105,8 @@ Feature
     StringIndexer
     StringIndexerModel
     Tokenizer
+    UnivariateFeatureSelector
+    UnivariateFeatureSelectorModel
     VarianceThresholdSelector
     VarianceThresholdSelectorModel
     VectorAssembler
@@ -272,10 +270,8 @@ Statistics
     :template: autosummary/class_with_docs.rst
     :toctree: api/
 
-    ANOVATest
     ChiSquareTest
     Correlation
-    FValueTest
     KolmogorovSmirnovTest
     MultivariateGaussian
     Summarizer
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index 546c463..f9d22ba 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -24,8 +24,7 @@ from pyspark.ml.util import JavaMLReadable, JavaMLWritable
 from pyspark.ml.wrapper import JavaEstimator, JavaModel, JavaParams, JavaTransformer, _jvm
 from pyspark.ml.common import inherit_doc
 
-__all__ = ['ANOVASelector', 'ANOVASelectorModel',
-           'Binarizer',
+__all__ = ['Binarizer',
            'BucketedRandomProjectionLSH', 'BucketedRandomProjectionLSHModel',
            'Bucketizer',
            'ChiSqSelector', 'ChiSqSelectorModel',
@@ -33,7 +32,6 @@ __all__ = ['ANOVASelector', 'ANOVASelectorModel',
            'DCT',
            'ElementwiseProduct',
            'FeatureHasher',
-           'FValueSelector', 'FValueSelectorModel',
            'HashingTF',
            'IDF', 'IDFModel',
            'Imputer', 'ImputerModel',
@@ -56,6 +54,7 @@ __all__ = ['ANOVASelector', 'ANOVASelectorModel',
            'StopWordsRemover',
            'StringIndexer', 'StringIndexerModel',
            'Tokenizer',
+           'UnivariateFeatureSelector', 'UnivariateFeatureSelectorModel',
            'VarianceThresholdSelector', 'VarianceThresholdSelectorModel',
            'VectorAssembler',
            'VectorIndexer', 'VectorIndexerModel',
@@ -5413,106 +5412,6 @@ class _SelectorModel(JavaModel, _SelectorParams):
 
 
 @inherit_doc
-class ANOVASelector(_Selector, JavaMLReadable, JavaMLWritable):
-    """
-    ANOVA F-value Classification selector, which selects continuous features to use for predicting
-    a categorical label.
-    The selector supports different selection methods: `numTopFeatures`, `percentile`, `fpr`,
-    `fdr`, `fwe`.
-
-    - `numTopFeatures` chooses a fixed number of top features according to a F value
-      classification test.
-    - `percentile` is similar but chooses a fraction of all features
-      instead of a fixed number.
-    - `fpr` chooses all features whose p-values are below a threshold,
-      thus controlling the false positive rate of selection.
-    - `fdr` uses the `Benjamini-Hochberg procedure \
-      <https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure>`_
-      to choose all features whose false discovery rate is below a threshold.
-    - `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
-      1 / `numFeatures`, thus controlling the family-wise error rate of selection.
-
-    By default, the selection method is `numTopFeatures`, with the default number of top features
-    set to 50.
-
-    .. versionadded:: 3.1.0
-
-    Examples
-    --------
-    >>> from pyspark.ml.linalg import Vectors
-    >>> df = spark.createDataFrame(
-    ...    [(Vectors.dense([1.7, 4.4, 7.6, 5.8, 9.6, 2.3]), 3.0),
-    ...     (Vectors.dense([8.8, 7.3, 5.7, 7.3, 2.2, 4.1]), 2.0),
-    ...     (Vectors.dense([1.2, 9.5, 2.5, 3.1, 8.7, 2.5]), 1.0),
-    ...     (Vectors.dense([3.7, 9.2, 6.1, 4.1, 7.5, 3.8]), 2.0),
-    ...     (Vectors.dense([8.9, 5.2, 7.8, 8.3, 5.2, 3.0]), 4.0),
-    ...     (Vectors.dense([7.9, 8.5, 9.2, 4.0, 9.4, 2.1]), 4.0)],
-    ...    ["features", "label"])
-    >>> selector = ANOVASelector(numTopFeatures=1, outputCol="selectedFeatures")
-    >>> model = selector.fit(df)
-    >>> model.getFeaturesCol()
-    'features'
-    >>> model.setFeaturesCol("features")
-    ANOVASelectorModel...
-    >>> model.transform(df).head().selectedFeatures
-    DenseVector([7.6])
-    >>> model.selectedFeatures
-    [2]
-    >>> anovaSelectorPath = temp_path + "/anova-selector"
-    >>> selector.save(anovaSelectorPath)
-    >>> loadedSelector = ANOVASelector.load(anovaSelectorPath)
-    >>> loadedSelector.getNumTopFeatures() == selector.getNumTopFeatures()
-    True
-    >>> modelPath = temp_path + "/anova-selector-model"
-    >>> model.save(modelPath)
-    >>> loadedModel = ANOVASelectorModel.load(modelPath)
-    >>> loadedModel.selectedFeatures == model.selectedFeatures
-    True
-    >>> loadedModel.transform(df).take(1) == model.transform(df).take(1)
-    True
-    """
-
-    @keyword_only
-    def __init__(self, *, numTopFeatures=50, featuresCol="features", outputCol=None,
-                 labelCol="label", selectorType="numTopFeatures", percentile=0.1, fpr=0.05,
-                 fdr=0.05, fwe=0.05):
-        """
-        __init__(self, \\*, numTopFeatures=50, featuresCol="features", outputCol=None, \
-                 labelCol="label", selectorType="numTopFeatures", percentile=0.1, fpr=0.05, \
-                 fdr=0.05, fwe=0.05)
-        """
-        super(ANOVASelector, self).__init__()
-        self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.ANOVASelector", self.uid)
-        kwargs = self._input_kwargs
-        self.setParams(**kwargs)
-
-    @keyword_only
-    @since("3.1.0")
-    def setParams(self, *, numTopFeatures=50, featuresCol="features", outputCol=None,
-                  labelCol="labels", selectorType="numTopFeatures", percentile=0.1, fpr=0.05,
-                  fdr=0.05, fwe=0.05):
-        """
-        setParams(self, \\*, numTopFeatures=50, featuresCol="features", outputCol=None, \
-                  labelCol="labels", selectorType="numTopFeatures", percentile=0.1, fpr=0.05, \
-                  fdr=0.05, fwe=0.05)
-        Sets params for this ANOVASelector.
-        """
-        kwargs = self._input_kwargs
-        return self._set(**kwargs)
-
-    def _create_model(self, java_model):
-        return ANOVASelectorModel(java_model)
-
-
-class ANOVASelectorModel(_SelectorModel, JavaMLReadable, JavaMLWritable):
-    """
-    Model fitted by :py:class:`ANOVASelector`.
-
-    .. versionadded:: 3.1.0
-    """
-
-
-@inherit_doc
 class ChiSqSelector(_Selector, JavaMLReadable, JavaMLWritable):
     """
     Chi-Squared feature selection, which selects categorical features to use for predicting a
@@ -5538,6 +5437,9 @@ class ChiSqSelector(_Selector, JavaMLReadable, JavaMLWritable):
     By default, the selection method is `numTopFeatures`, with the default number of top features
     set to 50.
 
+    .. deprecated:: 3.1.0
+        Use UnivariateFeatureSelector
+
     .. versionadded:: 2.0.0
 
     Examples
@@ -5613,110 +5515,6 @@ class ChiSqSelectorModel(_SelectorModel, JavaMLReadable, JavaMLWritable):
 
 
 @inherit_doc
-class FValueSelector(_Selector, JavaMLReadable, JavaMLWritable):
-    """
-    F Value Regression feature selector, which selects continuous features to use for predicting a
-    continuous label.
-    The selector supports different selection methods: `numTopFeatures`, `percentile`, `fpr`,
-    `fdr`, `fwe`.
-
-     * `numTopFeatures` chooses a fixed number of top features according to a F value
-        regression test.
-
-     * `percentile` is similar but chooses a fraction of all features
-       instead of a fixed number.
-
-     * `fpr` chooses all features whose p-values are below a threshold,
-       thus controlling the false positive rate of selection.
-
-     * `fdr` uses the `Benjamini-Hochberg procedure <https://en.wikipedia.org/wiki/
-       False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure>`_
-       to choose all features whose false discovery rate is below a threshold.
-
-     * `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
-       1/numFeatures, thus controlling the family-wise error rate of selection.
-
-    By default, the selection method is `numTopFeatures`, with the default number of top features
-    set to 50.
-
-    .. versionadded:: 3.1.0
-
-    Examples
-    --------
-    >>> from pyspark.ml.linalg import Vectors
-    >>> df = spark.createDataFrame(
-    ...    [(Vectors.dense([6.0, 7.0, 0.0, 7.0, 6.0, 0.0]), 4.6),
-    ...     (Vectors.dense([0.0, 9.0, 6.0, 0.0, 5.0, 9.0]), 6.6),
-    ...     (Vectors.dense([0.0, 9.0, 3.0, 0.0, 5.0, 5.0]), 5.1),
-    ...     (Vectors.dense([0.0, 9.0, 8.0, 5.0, 6.0, 4.0]), 7.6),
-    ...     (Vectors.dense([8.0, 9.0, 6.0, 5.0, 4.0, 4.0]), 9.0),
-    ...     (Vectors.dense([8.0, 9.0, 6.0, 4.0, 0.0, 0.0]), 9.0)],
-    ...    ["features", "label"])
-    >>> selector = FValueSelector(numTopFeatures=1, outputCol="selectedFeatures")
-    >>> model = selector.fit(df)
-    >>> model.getFeaturesCol()
-    'features'
-    >>> model.setFeaturesCol("features")
-    FValueSelectorModel...
-    >>> model.transform(df).head().selectedFeatures
-    DenseVector([0.0])
-    >>> model.selectedFeatures
-    [2]
-    >>> fvalueSelectorPath = temp_path + "/fvalue-selector"
-    >>> selector.save(fvalueSelectorPath)
-    >>> loadedSelector = FValueSelector.load(fvalueSelectorPath)
-    >>> loadedSelector.getNumTopFeatures() == selector.getNumTopFeatures()
-    True
-    >>> modelPath = temp_path + "/fvalue-selector-model"
-    >>> model.save(modelPath)
-    >>> loadedModel = FValueSelectorModel.load(modelPath)
-    >>> loadedModel.selectedFeatures == model.selectedFeatures
-    True
-    >>> loadedModel.transform(df).take(1) == model.transform(df).take(1)
-    True
-    """
-
-    @keyword_only
-    def __init__(self, *, numTopFeatures=50, featuresCol="features", outputCol=None,
-                 labelCol="label", selectorType="numTopFeatures", percentile=0.1, fpr=0.05,
-                 fdr=0.05, fwe=0.05):
-        """
-        __init__(self, \\*, numTopFeatures=50, featuresCol="features", outputCol=None, \
-                 labelCol="label", selectorType="numTopFeatures", percentile=0.1, fpr=0.05, \
-                 fdr=0.05, fwe=0.05)
-        """
-        super(FValueSelector, self).__init__()
-        self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.FValueSelector", self.uid)
-        kwargs = self._input_kwargs
-        self.setParams(**kwargs)
-
-    @keyword_only
-    @since("3.1.0")
-    def setParams(self, *, numTopFeatures=50, featuresCol="features", outputCol=None,
-                  labelCol="labels", selectorType="numTopFeatures", percentile=0.1, fpr=0.05,
-                  fdr=0.05, fwe=0.05):
-        """
-        setParams(self, \\*, numTopFeatures=50, featuresCol="features", outputCol=None, \
-                  labelCol="labels", selectorType="numTopFeatures", percentile=0.1, fpr=0.05, \
-                  fdr=0.05, fwe=0.05)
-        Sets params for this FValueSelector.
-        """
-        kwargs = self._input_kwargs
-        return self._set(**kwargs)
-
-    def _create_model(self, java_model):
-        return FValueSelectorModel(java_model)
-
-
-class FValueSelectorModel(_SelectorModel, JavaMLReadable, JavaMLWritable):
-    """
-    Model fitted by :py:class:`FValueSelector`.
-
-    .. versionadded:: 3.1.0
-    """
-
-
-@inherit_doc
 class VectorSizeHint(JavaTransformer, HasInputCol, HasHandleInvalid, JavaMLReadable,
                      JavaMLWritable):
     """
@@ -5952,6 +5750,243 @@ class VarianceThresholdSelectorModel(JavaModel, _VarianceThresholdSelectorParams
         return self._call_java("selectedFeatures")
 
 
+class _UnivariateFeatureSelectorParams(HasFeaturesCol, HasOutputCol, HasLabelCol):
+    """
+    Params for :py:class:`UnivariateFeatureSelector` and
+    :py:class:`UnivariateFeatureSelectorModel`.
+
+    .. versionadded:: 3.1.0
+    """
+
+    featureType = Param(Params._dummy(), "featureType",
+                        "The feature type. " +
+                        "Supported options: categorical, continuous.",
+                        typeConverter=TypeConverters.toString)
+
+    labelType = Param(Params._dummy(), "labelType",
+                      "The label type. " +
+                      "Supported options: categorical, continuous.",
+                      typeConverter=TypeConverters.toString)
+
+    selectionMode = Param(Params._dummy(), "selectionMode",
+                          "The selection mode. " +
+                          "Supported options: numTopFeatures (default), percentile, fpr, " +
+                          "fdr, fwe.",
+                          typeConverter=TypeConverters.toString)
+
+    selectionThreshold = Param(Params._dummy(), "selectionThreshold", "The upper bound of the " +
+                               "features that selector will select.",
+                               typeConverter=TypeConverters.toFloat)
+
+    def __init__(self, *args):
+        super(_UnivariateFeatureSelectorParams, self).__init__(*args)
+        self._setDefault(selectionMode="numTopFeatures")
+
+    @since("3.1.1")
+    def getFeatureType(self):
+        """
+        Gets the value of featureType or its default value.
+        """
+        return self.getOrDefault(self.featureType)
+
+    @since("3.1.1")
+    def getLabelType(self):
+        """
+        Gets the value of labelType or its default value.
+        """
+        return self.getOrDefault(self.labelType)
+
+    @since("3.1.1")
+    def getSelectionMode(self):
+        """
+        Gets the value of selectionMode or its default value.
+        """
+        return self.getOrDefault(self.selectionMode)
+
+    @since("3.1.1")
+    def getSelectionThreshold(self):
+        """
+        Gets the value of selectionThreshold or its default value.
+        """
+        return self.getOrDefault(self.selectionThreshold)
+
+
+@inherit_doc
+class UnivariateFeatureSelector(JavaEstimator, _UnivariateFeatureSelectorParams, JavaMLReadable,
+                                JavaMLWritable):
+    """
+    UnivariateFeatureSelector
+    The user can set `featureType` and `labelType`, and Spark will pick the score function based on
+    the specified `featureType` and `labelType`.
+
+    The following combination of `featureType` and `labelType` are supported:
+
+    - `featureType` `categorical` and `labelType` `categorical`, Spark uses chi2.
+    - `featureType` `continuous` and `labelType` `categorical`, Spark uses f_classif.
+    - `featureType` `continuous` and `labelType` `continuous`, Spark uses f_regression.
+
+    The `UnivariateFeatureSelector` supports different selection modes: `numTopFeatures`,
+    `percentile`, `fpr`, `fdr`, `fwe`.
+
+    - `numTopFeatures` chooses a fixed number of top features according to a according to a
+      hypothesis.
+    - `percentile` is similar but chooses a fraction of all features
+      instead of a fixed number.
+    - `fpr` chooses all features whose p-values are below a threshold,
+      thus controlling the false positive rate of selection.
+    - `fdr` uses the `Benjamini-Hochberg procedure \
+      <https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure>`_
+      to choose all features whose false discovery rate is below a threshold.
+    - `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
+      1 / `numFeatures`, thus controlling the family-wise error rate of selection.
+
+    By default, the selection mode is `numTopFeatures`.
+
+    .. versionadded:: 3.1.1
+
+    Examples
+    --------
+    >>> from pyspark.ml.linalg import Vectors
+    >>> df = spark.createDataFrame(
+    ...    [(Vectors.dense([1.7, 4.4, 7.6, 5.8, 9.6, 2.3]), 3.0),
+    ...     (Vectors.dense([8.8, 7.3, 5.7, 7.3, 2.2, 4.1]), 2.0),
+    ...     (Vectors.dense([1.2, 9.5, 2.5, 3.1, 8.7, 2.5]), 1.0),
+    ...     (Vectors.dense([3.7, 9.2, 6.1, 4.1, 7.5, 3.8]), 2.0),
+    ...     (Vectors.dense([8.9, 5.2, 7.8, 8.3, 5.2, 3.0]), 4.0),
+    ...     (Vectors.dense([7.9, 8.5, 9.2, 4.0, 9.4, 2.1]), 4.0)],
+    ...    ["features", "label"])
+    >>> selector = UnivariateFeatureSelector(outputCol="selectedFeatures")
+    >>> selector.setFeatureType("continuous").setLabelType("categorical").setSelectionThreshold(1)
+    UnivariateFeatureSelector...
+    >>> model = selector.fit(df)
+    >>> model.getFeaturesCol()
+    'features'
+    >>> model.setFeaturesCol("features")
+    UnivariateFeatureSelectorModel...
+    >>> model.transform(df).head().selectedFeatures
+    DenseVector([7.6])
+    >>> model.selectedFeatures
+    [2]
+    >>> selectorPath = temp_path + "/selector"
+    >>> selector.save(selectorPath)
+    >>> loadedSelector = UnivariateFeatureSelector.load(selectorPath)
+    >>> loadedSelector.getSelectionThreshold() == selector.getSelectionThreshold()
+    True
+    >>> modelPath = temp_path + "/selector-model"
+    >>> model.save(modelPath)
+    >>> loadedModel = UnivariateFeatureSelectorModel.load(modelPath)
+    >>> loadedModel.selectedFeatures == model.selectedFeatures
+    True
+    >>> loadedModel.transform(df).take(1) == model.transform(df).take(1)
+    True
+    """
+
+    @keyword_only
+    def __init__(self, *, featuresCol="features", outputCol=None,
+                 labelCol="label", selectionMode="numTopFeatures"):
+        """
+        __init__(self, \\*, featuresCol="features", outputCol=None, \
+                 labelCol="label", selectionMode="numTopFeatures")
+        """
+        super(UnivariateFeatureSelector, self).__init__()
+        self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.UnivariateFeatureSelector",
+                                            self.uid)
+        kwargs = self._input_kwargs
+        self.setParams(**kwargs)
+
+    @keyword_only
+    @since("3.1.1")
+    def setParams(self, *, featuresCol="features", outputCol=None,
+                  labelCol="labels", selectionMode="numTopFeatures"):
+        """
+        setParams(self, \\*, featuresCol="features", outputCol=None, \
+                  labelCol="labels", selectionMode="numTopFeatures")
+        Sets params for this UnivariateFeatureSelector.
+        """
+        kwargs = self._input_kwargs
+        return self._set(**kwargs)
+
+    @since("3.1.1")
+    def setFeatureType(self, value):
+        """
+        Sets the value of :py:attr:`featureType`.
+        """
+        return self._set(featureType=value)
+
+    @since("3.1.1")
+    def setLabelType(self, value):
+        """
+        Sets the value of :py:attr:`labelType`.
+        """
+        return self._set(labelType=value)
+
+    @since("3.1.1")
+    def setSelectionMode(self, value):
+        """
+        Sets the value of :py:attr:`selectionMode`.
+        """
+        return self._set(selectionMode=value)
+
+    @since("3.1.1")
+    def setSelectionThreshold(self, value):
+        """
+        Sets the value of :py:attr:`selectionThreshold`.
+        """
+        return self._set(selectionThreshold=value)
+
+    def setFeaturesCol(self, value):
+        """
+        Sets the value of :py:attr:`featuresCol`.
+        """
+        return self._set(featuresCol=value)
+
+    def setOutputCol(self, value):
+        """
+        Sets the value of :py:attr:`outputCol`.
+        """
+        return self._set(outputCol=value)
+
+    def setLabelCol(self, value):
+        """
+        Sets the value of :py:attr:`labelCol`.
+        """
+        return self._set(labelCol=value)
+
+    def _create_model(self, java_model):
+        return UnivariateFeatureSelectorModel(java_model)
+
+
+class UnivariateFeatureSelectorModel(JavaModel, _UnivariateFeatureSelectorParams, JavaMLReadable,
+                                     JavaMLWritable):
+    """
+    Model fitted by :py:class:`UnivariateFeatureSelector`.
+
+    .. versionadded:: 3.1.1
+    """
+
+    @since("3.1.1")
+    def setFeaturesCol(self, value):
+        """
+        Sets the value of :py:attr:`featuresCol`.
+        """
+        return self._set(featuresCol=value)
+
+    @since("3.1.1")
+    def setOutputCol(self, value):
+        """
+        Sets the value of :py:attr:`outputCol`.
+        """
+        return self._set(outputCol=value)
+
+    @property
+    @since("3.1.1")
+    def selectedFeatures(self):
+        """
+        List of indices to select (filter).
+        """
+        return self._call_java("selectedFeatures")
+
+
 if __name__ == "__main__":
     import doctest
     import sys
diff --git a/python/pyspark/ml/feature.pyi b/python/pyspark/ml/feature.pyi
index 4999def..33e4691 100644
--- a/python/pyspark/ml/feature.pyi
+++ b/python/pyspark/ml/feature.pyi
@@ -1456,38 +1456,6 @@ class _SelectorModel(JavaModel, _SelectorParams):
     @property
     def selectedFeatures(self) -> List[int]: ...
 
-class ANOVASelector(
-    _Selector[ANOVASelectorModel], JavaMLReadable[ANOVASelector], JavaMLWritable
-):
-    def __init__(
-        self,
-        numTopFeatures: int = ...,
-        featuresCol: str = ...,
-        outputCol: Optional[str] = ...,
-        labelCol: str = ...,
-        selectorType: str = ...,
-        percentile: float = ...,
-        fpr: float = ...,
-        fdr: float = ...,
-        fwe: float = ...,
-    ) -> None: ...
-    def setParams(
-        self,
-        numTopFeatures: int = ...,
-        featuresCol: str = ...,
-        outputCol: Optional[str] = ...,
-        labelCol: str = ...,
-        selectorType: str = ...,
-        percentile: float = ...,
-        fpr: float = ...,
-        fdr: float = ...,
-        fwe: float = ...,
-    ) -> ANOVASelector: ...
-
-class ANOVASelectorModel(
-    _SelectorModel, JavaMLReadable[ANOVASelectorModel], JavaMLWritable
-): ...
-
 class ChiSqSelector(
     _Selector[ChiSqSelectorModel],
     JavaMLReadable[ChiSqSelector],
@@ -1565,38 +1533,6 @@ class VectorSizeHint(
     def setInputCol(self, value: str) -> VectorSizeHint: ...
     def setHandleInvalid(self, value: str) -> VectorSizeHint: ...
 
-class FValueSelector(
-    _Selector[FValueSelectorModel], JavaMLReadable[FValueSelector], JavaMLWritable
-):
-    def __init__(
-        self,
-        numTopFeatures: int = ...,
-        featuresCol: str = ...,
-        outputCol: Optional[str] = ...,
-        labelCol: str = ...,
-        selectorType: str = ...,
-        percentile: float = ...,
-        fpr: float = ...,
-        fdr: float = ...,
-        fwe: float = ...,
-    ) -> None: ...
-    def setParams(
-        self,
-        numTopFeatures: int = ...,
-        featuresCol: str = ...,
-        outputCol: Optional[str] = ...,
-        labelCol: str = ...,
-        selectorType: str = ...,
-        percentile: float = ...,
-        fpr: float = ...,
-        fdr: float = ...,
-        fwe: float = ...,
-    ) -> FValueSelector: ...
-
-class FValueSelectorModel(
-    _SelectorModel, JavaMLReadable[FValueSelectorModel], JavaMLWritable
-): ...
-
 class _VarianceThresholdSelectorParams(HasFeaturesCol, HasOutputCol):
     varianceThreshold: Param[float] = ...
     def getVarianceThreshold(self) -> float: ...
@@ -1633,3 +1569,55 @@ class VarianceThresholdSelectorModel(
     def setOutputCol(self, value: str) -> VarianceThresholdSelectorModel: ...
     @property
     def selectedFeatures(self) -> List[int]: ...
+
+class _UnivariateFeatureSelectorParams(HasFeaturesCol, HasOutputCol, HasLabelCol):
+    featureType: Param[str] = ...
+    labelType: Param[str] = ...
+    selectionMode: Param[str] = ...
+    selectionThreshold: Param[float] = ...
+    def __init__(self, *args: Any): ...
+    def getFeatureType(self) -> str: ...
+    def getLabelType(self) -> str: ...
+    def getSelectionMode(self) -> str: ...
+    def getSelectionThreshold(self) -> float: ...
+
+class UnivariateFeatureSelector(
+    JavaEstimator[UnivariateFeatureSelectorModel],
+    _UnivariateFeatureSelectorParams,
+    JavaMLReadable[UnivariateFeatureSelector],
+    JavaMLWritable,
+):
+    def __init__(
+        self,
+        *,
+        featuresCol: str = ...,
+        outputCol: Optional[str] = ...,
+        labelCol: str = ...,
+        selectionMode: str = ...,
+    ) -> None: ...
+    def setParams(
+        self,
+        *,
+        featuresCol: str = ...,
+        outputCol: Optional[str] = ...,
+        labelCol: str = ...,
+        selectionMode: str = ...,
+    ) -> UnivariateFeatureSelector: ...
+    def setFeatureType(self, value: str) -> UnivariateFeatureSelector: ...
+    def setLabelType(self, value: str) -> UnivariateFeatureSelector: ...
+    def setSelectionMode(self, value: str) -> UnivariateFeatureSelector: ...
+    def setSelectionThreshold(self, value: float) -> UnivariateFeatureSelector: ...
+    def setFeaturesCol(self, value: str) -> UnivariateFeatureSelector: ...
+    def setOutputCol(self, value: str) -> UnivariateFeatureSelector: ...
+    def setLabelCol(self, value: str) -> UnivariateFeatureSelector: ...
+
+class UnivariateFeatureSelectorModel(
+    JavaModel,
+    _UnivariateFeatureSelectorParams,
+    JavaMLReadable[UnivariateFeatureSelectorModel],
+    JavaMLWritable,
+):
+    def setFeaturesCol(self, value: str) -> UnivariateFeatureSelectorModel: ...
+    def setOutputCol(self, value: str) -> UnivariateFeatureSelectorModel: ...
+    @property
+    def selectedFeatures(self) -> List[int]: ...
diff --git a/python/pyspark/ml/stat.py b/python/pyspark/ml/stat.py
index 4388de1..60eeb68 100644
--- a/python/pyspark/ml/stat.py
+++ b/python/pyspark/ml/stat.py
@@ -467,154 +467,6 @@ class MultivariateGaussian(object):
         self.cov = cov
 
 
-class ANOVATest(object):
-    """
-    Conduct ANOVA Classification Test for continuous features against categorical labels.
-
-    .. versionadded:: 3.1.0
-    """
-    @staticmethod
-    def test(dataset, featuresCol, labelCol, flatten=False):
-        """
-        Perform an ANOVA test using dataset.
-
-        .. versionadded:: 3.1.0
-
-        Parameters
-        ----------
-        dataset : :py:class:`pyspark.sql.DataFrame`
-            DataFrame of categorical labels and continuous features.
-        featuresCol : str
-            Name of features column in dataset, of type `Vector` (`VectorUDT`).
-        labelCol : str
-            Name of label column in dataset, of any numerical type.
-        flatten : bool, optional
-            if True, flattens the returned dataframe.
-
-        Returns
-        -------
-        :py:class:`pyspark.sql.DataFrame`
-            DataFrame containing the test result for every feature against the label.
-            If flatten is True, this DataFrame will contain one row per feature with the following
-            fields:
-
-            - `featureIndex: int`
-            - `pValue: float`
-            - `degreesOfFreedom: int`
-            - `fValue: float`
-
-            If flatten is False, this DataFrame will contain a single Row with the following fields:
-
-            - `pValues: Vector`
-            - `degreesOfFreedom: Array[int]`
-            - `fValues: Vector`
-
-            Each of these fields has one value per feature.
-
-        Examples
-        --------
-        >>> from pyspark.ml.linalg import Vectors
-        >>> from pyspark.ml.stat import ANOVATest
-        >>> dataset = [[2.0, Vectors.dense([0.43486404, 0.57153633, 0.43175686,
-        ...                                 0.51418671, 0.61632374, 0.96565515])],
-        ...            [1.0, Vectors.dense([0.49162732, 0.6785187, 0.85460572,
-        ...                                 0.59784822, 0.12394819, 0.53783355])],
-        ...            [2.0, Vectors.dense([0.30879653, 0.54904515, 0.17103889,
-        ...                                 0.40492506, 0.18957493, 0.5440016])],
-        ...            [3.0, Vectors.dense([0.68114391, 0.60549825, 0.69094651,
-        ...                                 0.62102109, 0.05471483, 0.96449167])]]
-        >>> dataset = spark.createDataFrame(dataset, ["label", "features"])
-        >>> anovaResult = ANOVATest.test(dataset, 'features', 'label')
-        >>> row = anovaResult.select("fValues", "pValues").collect()
-        >>> row[0].fValues
-        DenseVector([4.0264, 18.4713, 3.4659, 1.9042, 0.5532, 0.512])
-        >>> row[0].pValues
-        DenseVector([0.3324, 0.1623, 0.3551, 0.456, 0.689, 0.7029])
-        >>> anovaResult = ANOVATest.test(dataset, 'features', 'label', True)
-        >>> row = anovaResult.orderBy("featureIndex").collect()
-        >>> row[0].fValue
-        4.026438671875297
-        """
-        sc = SparkContext._active_spark_context
-        javaTestObj = _jvm().org.apache.spark.ml.stat.ANOVATest
-        args = [_py2java(sc, arg) for arg in (dataset, featuresCol, labelCol, flatten)]
-        return _java2py(sc, javaTestObj.test(*args))
-
-
-class FValueTest(object):
-    """
-    Conduct F Regression test for continuous features against continuous labels.
-
-    .. versionadded:: 3.1.0
-    """
-    @staticmethod
-    def test(dataset, featuresCol, labelCol, flatten=False):
-        """
-        Perform a F Regression test using dataset.
-
-        .. versionadded:: 3.1.0
-
-        Parameters
-        ----------
-        dataset : :py:class:`pyspark.sql.DataFrame`
-            DataFrame of continuous labels and continuous features.
-        featuresCol : str
-          Name of features column in dataset, of type `Vector` (`VectorUDT`).
-        labelCol : str
-            Name of label column in dataset, of any numerical type.
-        flatten : bool, optional
-            if True, flattens the returned dataframe.
-
-        Returns
-        -------
-        :py:class:`pyspark.sql.DataFrame`
-            DataFrame containing the test result for every feature against the label.
-            If flatten is True, this DataFrame will contain one row per feature with the following
-            fields:
-
-            - `featureIndex: int`
-            - `pValue: float`
-            - `degreesOfFreedom: int`
-            - `fValue: float`
-
-            If flatten is False, this DataFrame will contain a single Row with the following fields:
-
-            - `pValues: Vector`
-            - `degreesOfFreedom: Array[int]`
-            - `fValues: Vector`
-
-            Each of these fields has one value per feature.
-
-        Examples
-        --------
-        >>> from pyspark.ml.linalg import Vectors
-        >>> from pyspark.ml.stat import FValueTest
-        >>> dataset = [[0.57495218, Vectors.dense([0.43486404, 0.57153633, 0.43175686,
-        ...                                        0.51418671, 0.61632374, 0.96565515])],
-        ...            [0.84619853, Vectors.dense([0.49162732, 0.6785187, 0.85460572,
-        ...                                        0.59784822, 0.12394819, 0.53783355])],
-        ...            [0.39777647, Vectors.dense([0.30879653, 0.54904515, 0.17103889,
-        ...                                        0.40492506, 0.18957493, 0.5440016])],
-        ...            [0.79201573, Vectors.dense([0.68114391, 0.60549825, 0.69094651,
-        ...                                        0.62102109, 0.05471483, 0.96449167])]]
-        >>> dataset = spark.createDataFrame(dataset, ["label", "features"])
-        >>> fValueResult = FValueTest.test(dataset, 'features', 'label')
-        >>> row = fValueResult.select("fValues", "pValues").collect()
-        >>> row[0].fValues
-        DenseVector([3.741, 7.5807, 142.0684, 34.9849, 0.4112, 0.0539])
-        >>> row[0].pValues
-        DenseVector([0.1928, 0.1105, 0.007, 0.0274, 0.5871, 0.838])
-        >>> fValueResult = FValueTest.test(dataset, 'features', 'label', True)
-        >>> row = fValueResult.orderBy("featureIndex").collect()
-        >>> row[0].fValue
-        3.7409548308350593
-        """
-        sc = SparkContext._active_spark_context
-        javaTestObj = _jvm().org.apache.spark.ml.stat.FValueTest
-        args = [_py2java(sc, arg) for arg in (dataset, featuresCol, labelCol, flatten)]
-        return _java2py(sc, javaTestObj.test(*args))
-
-
 if __name__ == "__main__":
     import doctest
     import numpy
diff --git a/python/pyspark/ml/stat.pyi b/python/pyspark/ml/stat.pyi
index 83b0f7e..30485a7 100644
--- a/python/pyspark/ml/stat.pyi
+++ b/python/pyspark/ml/stat.pyi
@@ -75,15 +75,3 @@ class MultivariateGaussian:
     mean: Vector
     cov: Matrix
     def __init__(self, mean: Vector, cov: Matrix) -> None: ...
-
-class ANOVATest:
-    @staticmethod
-    def test(
-        dataset: DataFrame, featuresCol: str, labelCol: str, flatten: bool = ...
-    ) -> DataFrame: ...
-
-class FValueTest:
-    @staticmethod
-    def test(
-        dataset: DataFrame, featuresCol: str, labelCol: str, flatten: bool = ...
-    ) -> DataFrame: ...


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org