You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pralabhkumar <gi...@git.apache.org> on 2017/05/26 07:25:30 UTC

[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

GitHub user pralabhkumar opened a pull request:

    https://github.com/apache/spark/pull/18118

    SPARK-20199 : Provided featureSubsetStrategy to GBTClassifier

    ## What changes were proposed in this pull request?
    
    (Provided featureSubset Strategy to GBTClassifier
    a) Moved featureSubsetStrategy to TreeEnsembleParams
    b)  Changed GBTClassifier to pass featureSubsetStrategy
    val firstTreeModel = firstTree.train(input, treeStrategy, featureSubsetStrategy))
    
    ## How was this patch tested?
    a) Tested GradientBoostedTreeClassifierExample by adding .setFeatureSubsetStrategy with GBTClassifier
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pralabhkumar/spark develop

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18118
    
----
commit b0444fa75f4cc33a0c35cf88664a89a1c425e7a1
Author: Pralabh Kumar <pr...@gmail.com>
Date:   2017-05-26T07:16:32Z

    SPARK-20199 : Provided featureSubsetStrategy to GBTClassifier

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick @jkbradley Please find some time to review it . @sethah has given LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77501/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77504/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123422892
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -192,6 +196,9 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] {
     
       @Since("2.0.0")
       override def load(path: String): GBTClassifier = super.load(path)
    +
    +  final val supportedFeatureSubsetStrategies: Array[String] =
    --- End diff --
    
    @sethah 
    Done 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    cc @sethah also 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #79147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79147/testReport)** for PR 18118 at commit [`61745ba`](https://github.com/apache/spark/commit/61745ba2e5be5f4838a23a226ef67142dd6090cf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83350/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148095844
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    --- End diff --
    
    Not sure if we need these tests. We already test the `featureSubsetStrategy` in the RandomForest implementation. At any rate, let's not set parameters that aren't strictly relevant to the test (stepSize, impurity, etc...). 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119271662
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -441,12 +415,44 @@ private[ml] trait RandomForestParams extends TreeEnsembleParams {
       final def getFeatureSubsetStrategy: String = $(featureSubsetStrategy).toLowerCase(Locale.ROOT)
     }
     
    -private[spark] object RandomForestParams {
    -  // These options should be lowercase.
    -  final val supportedFeatureSubsetStrategies: Array[String] =
    -    Array("auto", "all", "onethird", "sqrt", "log2").map(_.toLowerCase(Locale.ROOT))
    +
    +
    +/**
    + * Parameters for Random Forest algorithms.
    + */
    +private[ml] trait RandomForestParams extends TreeEnsembleParams {
    +
    +  /**
    +   * Number of trees to train (>= 1).
    +   * If 1, then no bootstrapping is used.  If > 1, then bootstrapping is done.
    +   * TODO: Change to always do bootstrapping (simpler).  SPARK-7130
    +   * (default = 20)
    +   *
    +   * Note: The reason that we cannot add this to both GBT and RF (i.e. in TreeEnsembleParams)
    +   * is the param `maxIter` controls how many trees a GBT has. The semantics in the algorithms
    +   * are a bit different.
    +   * @group param
    +   */
    +  final val numTrees: IntParam = new IntParam(this, "numTrees", "Number of trees to train (>= 1)",
    +    ParamValidators.gtEq(1))
    +
    +  setDefault(numTrees -> 20)
    +
    +  /**
    +   * @deprecated This method is deprecated and will be removed in 3.0.0.
    +   * @group setParam
    +   */
    +  @deprecated("This method is deprecated and will be removed in 3.0.0.", "2.1.0")
    +  def setNumTrees(value: Int): this.type = set(numTrees, value)
    +
    +  /** @group getParam */
    +  final def getNumTrees: Int = $(numTrees)
    +
    +
     }
     
    +
    +
    --- End diff --
    
    too many blank lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148063860
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -108,7 +108,8 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
         val instr = Instrumentation.create(this, oldDataset)
         instr.logParams(params: _*)
     
    -    val trees = RandomForest.run(oldDataset, strategy, numTrees = 1, featureSubsetStrategy = "all",
    +    val trees = RandomForest.run(oldDataset, strategy, numTrees = 1,
    --- End diff --
    
    revert this


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78417/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @mpjlu Thanks for reviewing the code . I have done the code changes as suggested .
    Build is passed with all test cases.
    
    Please review and let me know if further changes are required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77578/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #79150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79150/testReport)** for PR 18118 at commit [`f547dec`](https://github.com/apache/spark/commit/f547dec79b6c4e44d1cadcbf22f2ab52e9e403eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77501/testReport)** for PR 18118 at commit [`9b3d03b`](https://github.com/apache/spark/commit/9b3d03ba99e69176e94b518a3f8cd2ac9e55d10e).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    12d83aa is successful . Please review the pull request .
    @MLnick @sethah @mpjlu @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick
    Thanks for reviewing . I have added comment .  
    
    Please  review them 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83358/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123042480
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
    @@ -49,14 +49,16 @@ import org.apache.spark.rdd.RDD
     @Since("1.2.0")
     class GradientBoostedTrees private[spark] (
         private val boostingStrategy: BoostingStrategy,
    -    private val seed: Int)
    +    private val seed: Int,
    +    private val featureSubsetStrategy: String)
    --- End diff --
    
    MLlib is in maintenance mode. I'd prefer to not make this change, and hard code it to "all".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #78420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78420/testReport)** for PR 18118 at commit [`7970293`](https://github.com/apache/spark/commit/79702933d321051222073057b25305831df84c6d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick @WeichenXu123 @sethah Thanks for your help throughout the process.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Jenkins test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick @sethah ,please let me know ,if you are ok with the changes . So that we can proceed forward . Thanks for your help :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119271552
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -305,7 +305,7 @@ private[ml] object TreeRegressorParams {
     }
     
     private[ml] trait DecisionTreeRegressorParams extends DecisionTreeParams
    -  with TreeRegressorParams with HasVarianceCol {
    +  with TreeRegressorParams with HasVarianceCol  {
    --- End diff --
    
    two space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick  Thanks for the reviewing the code . Have done changes as suggested. 
    
    Please proceed further if its good to go .
    
    Thanks  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148197373
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    --- End diff --
    
    Removed stepSize , impurity and other  parameters 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123040767
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends Logging {
         logDebug("##########")
         logDebug("Building tree 0")
         logDebug("##########")
    +    logDebug("Featuer Subset Strategy " + featureSubsetStrategy)
    --- End diff --
    
    this will get logged with the instrumentation, so I'd rather remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123257746
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -359,38 +365,6 @@ private[ml] trait TreeEnsembleParams extends DecisionTreeParams {
           oldImpurity: OldImpurity): OldStrategy = {
         super.getOldStrategy(categoricalFeatures, numClasses, oldAlgo, oldImpurity, getSubsamplingRate)
       }
    -}
    -
    -/**
    - * Parameters for Random Forest algorithms.
    - */
    -private[ml] trait RandomForestParams extends TreeEnsembleParams {
    --- End diff --
    
    Why is this being moved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123051005
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -319,8 +327,10 @@ private[spark] object GradientBoostedTrees extends Logging {
           logDebug("###################################################")
           logDebug("Gradient boosting tree iteration " + m)
           logDebug("###################################################")
    +
           val dt = new DecisionTreeRegressor().setSeed(seed + m)
    -      val model = dt.train(data, treeStrategy)
    +
    --- End diff --
    
    remove blank line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #78417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78417/testReport)** for PR 18118 at commit [`13ce412`](https://github.com/apache/spark/commit/13ce412ce47ea21b8850c47e361ad872425bfb9a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123264302
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends Logging {
         logDebug("##########")
         logDebug("Building tree 0")
         logDebug("##########")
    +    logDebug("Featuer Subset Strategy " + featureSubsetStrategy)
     
         // Initialize tree
         timer.start("building tree 0")
         val firstTree = new DecisionTreeRegressor().setSeed(seed)
    -    val firstTreeModel = firstTree.train(input, treeStrategy)
    +
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119271063
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -99,6 +99,8 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
       @Since("2.0.0")
       def setVarianceCol(value: String): this.type = set(varianceCol, value)
     
    +
    +
    --- End diff --
    
    delete this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148197257
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -108,7 +108,8 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
         val instr = Instrumentation.create(this, oldDataset)
         instr.logParams(params: _*)
     
    -    val trees = RandomForest.run(oldDataset, strategy, numTrees = 1, featureSubsetStrategy = "all",
    +    val trees = RandomForest.run(oldDataset, strategy, numTrees = 1,
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r125484321
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -196,7 +196,10 @@ object MimaExcludes {
           ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.startOffset"),
           ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.endOffset"),
           ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.this"),
    -      ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query")
    +      ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query"),
    +    
    +     // [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees
    +     ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this")
    --- End diff --
    
    @MLnick Yes you are correct , I have removed it . Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick @sethah please find some time to look into this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77578/testReport)** for PR 18118 at commit [`f9aa30c`](https://github.com/apache/spark/commit/f9aa30c175ab143418b54c1910783d6af94b6662).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    ping @sethah  @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r149761025
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
    @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTRegressor()
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    // GBT with different featureSubsetStrategy
    +    val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1")
    +    val importanceFeatures = gbtWithFeatureSubset.fit(df).featureImportances
    +    val mostIF = importanceFeatures.argmax
    +    assert(!(mostImportantFeature === mostIF))
    +    assert(importanceFeatures.toArray.sum === 1.0)
    +    assert(importanceFeatures.toArray.forall(_ >= 0.0))
    +    assert(!(importanceFeatures.toDense.values.deep === importances.toDense.values.deep))
    --- End diff --
    
    Same here - we can remove unnecessary assertions as per https://github.com/apache/spark/pull/18118#discussion_r148096176


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119364846
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -99,6 +99,8 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
       @Since("2.0.0")
       def setVarianceCol(value: String): this.type = set(varianceCol, value)
     
    +
    +
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83631/testReport)** for PR 18118 at commit [`ea03683`](https://github.com/apache/spark/commit/ea03683a4c388eaee70bf66fc41fd89a3a81a6a3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83627/testReport)** for PR 18118 at commit [`af01cc4`](https://github.com/apache/spark/commit/af01cc4ea2f9756d2a3405969c3d2bb5abb6be13).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83354/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18118


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83283/testReport)** for PR 18118 at commit [`77aa1d7`](https://github.com/apache/spark/commit/77aa1d77c8997d1f5eb2e8485194584f34832ba8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r149886357
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
    @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTRegressor()
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    // GBT with different featureSubsetStrategy
    +    val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1")
    +    val importanceFeatures = gbtWithFeatureSubset.fit(df).featureImportances
    +    val mostIF = importanceFeatures.argmax
    +    assert(!(mostImportantFeature === mostIF))
    +    assert(importanceFeatures.toArray.sum === 1.0)
    +    assert(importanceFeatures.toArray.forall(_ >= 0.0))
    +    assert(!(importanceFeatures.toDense.values.deep === importances.toDense.values.deep))
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Ping @MLnick @jkbradley  . @sethah has given LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123039522
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
    @@ -140,6 +140,10 @@ class GBTRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: String)
       @Since("1.4.0")
       def setLossType(value: String): this.type = set(lossType, value)
     
    +  /** @group setParam */
    +  override def setFeatureSubsetStrategy(value: String): this.type =
    --- End diff --
    
    add since tag


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77503/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123257669
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -359,38 +365,6 @@ private[ml] trait TreeEnsembleParams extends DecisionTreeParams {
           oldImpurity: OldImpurity): OldStrategy = {
         super.getOldStrategy(categoricalFeatures, numClasses, oldAlgo, oldImpurity, getSubsamplingRate)
       }
    -}
    --- End diff --
    
    Why is this being moved around?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148063967
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
     
       /** (private[ml]) Train a decision tree on an RDD */
       private[ml] def train(data: RDD[LabeledPoint],
    -      oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
    +      oldStrategy: OldStrategy, featureSubsetStrategy: String): DecisionTreeRegressionModel = {
         val instr = Instrumentation.create(this, data)
         instr.logParams(params: _*)
     
    -    val trees = RandomForest.run(data, oldStrategy, numTrees = 1, featureSubsetStrategy = "all",
    +    val trees = RandomForest.run(data, oldStrategy, numTrees = 1,
    +      featureSubsetStrategy,
    --- End diff --
    
    move it up a line


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119365434
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -37,11 +37,15 @@ object MimaExcludes {
       // Exclude rules for 2.3.x
       lazy val v23excludes = v22excludes ++ Seq(
         // [SPARK-20495][SQL] Add StorageLevel to cacheTable API
    -    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable")
    +    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable"),
    +   
    +    // [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees
    +    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this")
       )
    --- End diff --
    
    I put it in V21 excludes . Please let me know ,if you are expecting something else


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r149886323
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
    @@ -173,6 +178,10 @@ object GBTRegressor extends DefaultParamsReadable[GBTRegressor] {
     
       @Since("2.0.0")
       override def load(path: String): GBTRegressor = super.load(path)
    +
    +  @Since("2.3.0")
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123264263
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends Logging {
         logDebug("##########")
         logDebug("Building tree 0")
         logDebug("##########")
    +    logDebug("Featuer Subset Strategy " + featureSubsetStrategy)
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah Thanks for reviewing code . I have done all the changed as suggested by you . 
    
    Please review them and let me know if further changes are required. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123265283
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -192,6 +196,9 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] {
     
       @Since("2.0.0")
       override def load(path: String): GBTClassifier = super.load(path)
    +
    +  final val supportedFeatureSubsetStrategies: Array[String] =
    --- End diff --
    
    done . I will add this to GBTRegressor in next pull request (forgot to add in this one) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77588/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119270999
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -136,12 +136,20 @@ class GBTClassifier @Since("1.4.0") (
       @Since("1.4.0")
       override def setStepSize(value: Double): this.type = set(stepSize, value)
     
    +  /** @group setParam */
    +  override def setFeatureSubsetStrategy(value: String): this.type =
    +    set(featureSubsetStrategy, value)
    +
       // Parameters from GBTClassifierParams:
     
       /** @group setParam */
       @Since("1.4.0")
       def setLossType(value: String): this.type = set(lossType, value)
     
    +
    +
    +
    +
    --- End diff --
    
    delete blank line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #79150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79150/testReport)** for PR 18118 at commit [`f547dec`](https://github.com/apache/spark/commit/f547dec79b6c4e44d1cadcbf22f2ab52e9e403eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77497/testReport)** for PR 18118 at commit [`b0444fa`](https://github.com/apache/spark/commit/b0444fa75f4cc33a0c35cf88664a89a1c425e7a1).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah Build is passed :) ,have done the changes as suggested (setting maxIter and maxDepth).
    
    ping @MLnick or @jkbradley so we can move ahead with it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123050956
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends Logging {
         logDebug("##########")
         logDebug("Building tree 0")
         logDebug("##########")
    +    logDebug("Featuer Subset Strategy " + featureSubsetStrategy)
     
         // Initialize tree
         timer.start("building tree 0")
         val firstTree = new DecisionTreeRegressor().setSeed(seed)
    -    val firstTreeModel = firstTree.train(input, treeStrategy)
    +
    --- End diff --
    
    remove blank line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    ping @MLnick @sethah


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    I'll take a look at the changes in the next few days. In the meantime, you can remove "Please review http://spark.apache.org/contributing.html before opening a pull request." from the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick  @sethah  please find some time to look into this


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148096176
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    +      .setImpurity("Gini")
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    // GBT with different featureSubsetStrategy
    +    val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1")
    +    val importanceFeatures = gbtWithFeatureSubset.fit(df).featureImportances
    +    val mostIF = importanceFeatures.argmax
    +    assert(!(mostImportantFeature === mostIF))
    +    assert(importanceFeatures.toArray.sum === 1.0)
    --- End diff --
    
    last three assertions here aren't necessary. Also why not `!==`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78420/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83348/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83627/testReport)** for PR 18118 at commit [`af01cc4`](https://github.com/apache/spark/commit/af01cc4ea2f9756d2a3405969c3d2bb5abb6be13).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick Thanks for reviewing . Have done all the changes suggested by you . Please review . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119364792
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -136,12 +136,20 @@ class GBTClassifier @Since("1.4.0") (
       @Since("1.4.0")
       override def setStepSize(value: Double): this.type = set(stepSize, value)
     
    +  /** @group setParam */
    +  override def setFeatureSubsetStrategy(value: String): this.type =
    +    set(featureSubsetStrategy, value)
    +
       // Parameters from GBTClassifierParams:
     
       /** @group setParam */
       @Since("1.4.0")
       def setLossType(value: String): this.type = set(lossType, value)
     
    +
    +
    +
    +
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77503/testReport)** for PR 18118 at commit [`426bc68`](https://github.com/apache/spark/commit/426bc68e4b75fba76d993e0da94d28982b449c72).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @jkbradley Please review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r149886343
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
    @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTRegressor()
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77504/testReport)** for PR 18118 at commit [`16ccbdf`](https://github.com/apache/spark/commit/16ccbdfd8862c528c90fdde94c8ec20d6631126e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah agree with you . Sorry if I unnecessary bother , was eager to get reviews on pull request. 
    Thanks for the suggestion , will keep in mind


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123050801
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
    @@ -150,11 +154,11 @@ class GBTRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: String)
         val instr = Instrumentation.create(this, oldDataset)
         instr.logParams(labelCol, featuresCol, predictionCol, impurity, lossType,
           maxDepth, maxBins, maxIter, maxMemoryInMB, minInfoGain, minInstancesPerNode,
    -      seed, stepSize, subsamplingRate, cacheNodeIds, checkpointInterval)
    +      seed, stepSize, subsamplingRate, cacheNodeIds, checkpointInterval, featureSubsetStrategy)
         instr.logNumFeatures(numFeatures)
     
         val (baseLearners, learnerWeights) = GradientBoostedTrees.run(oldDataset, boostingStrategy,
    -      $(seed))
    +      $(seed), getFeatureSubsetStrategy)
    --- End diff --
    
    use `$(featureSubsetStrategy)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123264390
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
    @@ -49,14 +49,16 @@ import org.apache.spark.rdd.RDD
     @Since("1.2.0")
     class GradientBoostedTrees private[spark] (
         private val boostingStrategy: BoostingStrategy,
    -    private val seed: Int)
    +    private val seed: Int,
    +    private val featureSubsetStrategy: String)
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83283/testReport)** for PR 18118 at commit [`77aa1d7`](https://github.com/apache/spark/commit/77aa1d77c8997d1f5eb2e8485194584f34832ba8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r125484140
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,47 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    +      .setImpurity("Gini")
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    val gbtWithFeatureSubset = new GBTClassifier()
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    ping @sethah , please let me know if there is any update on it . Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick @sethah please find some time to look into this .It will be really great if we can include this feature in spark 2.3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148197388
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    +      .setImpurity("Gini")
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    // GBT with different featureSubsetStrategy
    +    val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1")
    +    val importanceFeatures = gbtWithFeatureSubset.fit(df).featureImportances
    +    val mostIF = importanceFeatures.argmax
    +    assert(!(mostImportantFeature === mostIF))
    +    assert(importanceFeatures.toArray.sum === 1.0)
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah Thanks for comment . Have done the changes as suggested in PR description.  
    
    I'll wait for the review comments from your side :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123040612
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -73,19 +75,21 @@ private[spark] object GradientBoostedTrees extends Logging {
           input: RDD[LabeledPoint],
           validationInput: RDD[LabeledPoint],
           boostingStrategy: OldBoostingStrategy,
    -      seed: Long): (Array[DecisionTreeRegressionModel], Array[Double]) = {
    +      seed: Long,
    +      featureSubsetStrategy: String): (Array[DecisionTreeRegressionModel], Array[Double]) = {
    --- End diff --
    
    Probably better to add this to `BoostingStrategy` and then you don't need to manually pass it everywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #78384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78384/testReport)** for PR 18118 at commit [`d0bb8bc`](https://github.com/apache/spark/commit/d0bb8bc43e8080492a49505b7a701bca165b03c8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83372/testReport)** for PR 18118 at commit [`0e9507e`](https://github.com/apache/spark/commit/0e9507e13a352b59798993228709bf7654747d0c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick Please find some time to review it and let me know if we can proceed with this. Thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123264336
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -319,8 +327,10 @@ private[spark] object GradientBoostedTrees extends Logging {
           logDebug("###################################################")
           logDebug("Gradient boosting tree iteration " + m)
           logDebug("###################################################")
    +
           val dt = new DecisionTreeRegressor().setSeed(seed + m)
    -      val model = dt.train(data, treeStrategy)
    +
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148065729
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
     
       /** (private[ml]) Train a decision tree on an RDD */
       private[ml] def train(data: RDD[LabeledPoint],
    -      oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
    +      oldStrategy: OldStrategy, featureSubsetStrategy: String): DecisionTreeRegressionModel = {
    --- End diff --
    
    follow Spark style here
    
    ```scala
      private[ml] def train(
          data: RDD[LabeledPoint],
          oldStrategy: OldStrategy,
          featureSubsetStrategy: String): DecisionTreeRegressionModel = {
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123269246
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -359,38 +365,6 @@ private[ml] trait TreeEnsembleParams extends DecisionTreeParams {
           oldImpurity: OldImpurity): OldStrategy = {
         super.getOldStrategy(categoricalFeatures, numClasses, oldAlgo, oldImpurity, getSubsamplingRate)
       }
    -}
    -
    -/**
    - * Parameters for Random Forest algorithms.
    - */
    -private[ml] trait RandomForestParams extends TreeEnsembleParams {
    --- End diff --
    
    @MLnick 
     
    Earlier featureSubsetStrategy, setFeatureSubsetStrategy, getFeatureSubsetStrategy were part of RandomForestParams , moved them to be part of TreeEnsembleParams , so that it can be accessed by both Random forest and GBT . I have not moved anything apart from that . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick @srowen 
    Thanks for your comments , I will wait for someone to review .    :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    can any one of admin ,please review the pull request.  It would be really helpful. Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    can any one of admins please review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77588/testReport)** for PR 18118 at commit [`12d83aa`](https://github.com/apache/spark/commit/12d83aaa5f24e17304de760c5664b8ede55d678a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @mpjlu 
    please find some time to review the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    I don't think there's any point in pinging every day :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83631/testReport)** for PR 18118 at commit [`ea03683`](https://github.com/apache/spark/commit/ea03683a4c388eaee70bf66fc41fd89a3a81a6a3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #78420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78420/testReport)** for PR 18118 at commit [`7970293`](https://github.com/apache/spark/commit/79702933d321051222073057b25305831df84c6d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #83372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83372/testReport)** for PR 18118 at commit [`0e9507e`](https://github.com/apache/spark/commit/0e9507e13a352b59798993228709bf7654747d0c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r125484168
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
    @@ -166,6 +166,45 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTRegressor()
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    val gbtWithFeatureSubset = new GBTRegressor()
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    I think this may be a good feature to have, but haven't had bandwidth to
    take a look and won't for the next week or so.
    
    Not sure if @sethah has any time, or someone else who has been doing stuff
    with trees recently.
    
    As far as I can see this param setting is similar to xgboost "col sample by
    level" - so it's not a feature sample by tree built (which is subsample
    rate param in spark) but a feature sample by node built.
    
    In any case RF does expose it and GBT doesn't and it does seem like it
    could be exposed as an option. I don't see any downside or issue with it.
    
    So it may just take some patience right now until someone can give a review.
    On Mon, 12 Jun 2017 at 10:41, Sean Owen <no...@github.com> wrote:
    
    > @pralabhkumar <https://github.com/pralabhkumar> if there is no support
    > for this we should close it. Please stop commenting
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/18118#issuecomment-307725949>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AA_SB3iOg3-S2N42rV9EYkbdVRi1otkqks5sDPmagaJpZM4NnQyV>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah 
    
    Thanks for reviewing the pull request .
    
    - Change the title to obey the proper format [SPARK-20199][ML] ...
    
    - Response : Done 
    
    - Change  title to reflect that both GBTClassifier and GBTRegressor are changed
    
    - Response : Done
    
    - Please  remove all the text you did not write from the PR description
    
    - Response : Done
    
    - Add a test to check that the default values are correct for GBTClassifier/Regressor. See the test in logistic regression titled: "logistic regression: default params" for reference
    
    - Response : Done
    
    - I'd  like to test that this change takes effect. One way might be to construct a small dataset where one feature is highly predictive and other features are less so, train with featureSubsetStrategy = "all" and with featureSubsetStrategy = "1" and they should not produce the same tree. I'm open to other, simpler ways to test it if you can think of some.
    
    - Response : Added test case to check for featureSubsetStrategy parameter. Creating two GBT trees ,one with subset strategy "all" and other with "1" . Comparing their most important feature and important features vector to make sure tree are different


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77497/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @pralabhkumar if there is no support for this we should close it. Please stop commenting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #79147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79147/testReport)** for PR 18118 at commit [`61745ba`](https://github.com/apache/spark/commit/61745ba2e5be5f4838a23a226ef67142dd6090cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83631/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @MLnick @sethah its been more than couple of months since the code changes has done as suggested. It would be really great if you can find some time to review it . Please review the pull request


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r125427950
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -196,7 +196,10 @@ object MimaExcludes {
           ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.startOffset"),
           ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.endOffset"),
           ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.this"),
    -      ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query")
    +      ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query"),
    +    
    +     // [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees
    +     ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this")
    --- End diff --
    
    Do we really still need this MiMa exclusion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r125427310
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
    @@ -166,6 +166,45 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTRegressor()
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    val gbtWithFeatureSubset = new GBTRegressor()
    --- End diff --
    
    Same comment applies here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148095940
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    +      .setImpurity("Gini")
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    --- End diff --
    
    the last two assertions here aren't necessary


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    ping @sethah @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77497/testReport)** for PR 18118 at commit [`b0444fa`](https://github.com/apache/spark/commit/b0444fa75f4cc33a0c35cf88664a89a1c425e7a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119271935
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -37,11 +37,15 @@ object MimaExcludes {
       // Exclude rules for 2.3.x
       lazy val v23excludes = v22excludes ++ Seq(
         // [SPARK-20495][SQL] Add StorageLevel to cacheTable API
    -    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable")
    +    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable"),
    +   
    +    // [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees
    +    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this")
       )
    --- End diff --
    
    is this the right place?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123264432
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -136,6 +136,10 @@ class GBTClassifier @Since("1.4.0") (
       @Since("1.4.0")
       override def setStepSize(value: Double): this.type = set(stepSize, value)
     
    +  /** @group setParam */
    +  override def setFeatureSubsetStrategy(value: String): this.type =
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83283/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @mpjlu : Please review the changes 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148197285
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
     
       /** (private[ml]) Train a decision tree on an RDD */
       private[ml] def train(data: RDD[LabeledPoint],
    -      oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
    +      oldStrategy: OldStrategy, featureSubsetStrategy: String): DecisionTreeRegressionModel = {
         val instr = Instrumentation.create(this, data)
         instr.logParams(params: _*)
     
    -    val trees = RandomForest.run(data, oldStrategy, numTrees = 1, featureSubsetStrategy = "all",
    +    val trees = RandomForest.run(data, oldStrategy, numTrees = 1,
    +      featureSubsetStrategy,
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    16ccbdf is successful . Please review the pull request .  
    @MLnick @sethah @mpjlu @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119271607
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -420,18 +394,18 @@ private[ml] trait RandomForestParams extends TreeEnsembleParams {
        */
       final val featureSubsetStrategy: Param[String] = new Param[String](this, "featureSubsetStrategy",
         "The number of features to consider for splits at each tree node." +
    -      s" Supported options: ${RandomForestParams.supportedFeatureSubsetStrategies.mkString(", ")}" +
    +      s" Supported options: ${TreeEnsembleParams.supportedFeatureSubsetStrategies.mkString(", ")}" +
           s", (0.0-1.0], [1-n].",
         (value: String) =>
    -      RandomForestParams.supportedFeatureSubsetStrategies.contains(
    +      TreeEnsembleParams.supportedFeatureSubsetStrategies.contains(
             value.toLowerCase(Locale.ROOT))
    -      || Try(value.toInt).filter(_ > 0).isSuccess
    -      || Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess)
    +        || Try(value.toInt).filter(_ > 0).isSuccess
    +        || Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess)
    --- End diff --
    
    indent wrong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79147/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148197380
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    +      .setImpurity("Gini")
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123051195
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
     
       /** (private[ml]) Train a decision tree on an RDD */
       private[ml] def train(data: RDD[LabeledPoint],
    -      oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
    +      oldStrategy: OldStrategy, featureSubsetStrategy: String): DecisionTreeRegressionModel = {
         val instr = Instrumentation.create(this, data)
         instr.logParams(params: _*)
     
    -    val trees = RandomForest.run(data, oldStrategy, numTrees = 1, featureSubsetStrategy = "all",
    +    val trees = RandomForest.run(data, oldStrategy, numTrees = 1,
    +      featureSubsetStrategy = featureSubsetStrategy,
    --- End diff --
    
    don't use a named parameter since the meaning is now clear


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123264215
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala ---
    @@ -73,19 +75,21 @@ private[spark] object GradientBoostedTrees extends Logging {
           input: RDD[LabeledPoint],
           validationInput: RDD[LabeledPoint],
           boostingStrategy: OldBoostingStrategy,
    -      seed: Long): (Array[DecisionTreeRegressionModel], Array[Double]) = {
    +      seed: Long,
    +      featureSubsetStrategy: String): (Array[DecisionTreeRegressionModel], Array[Double]) = {
    --- End diff --
    
    @sethah I tried to make it similar to RandomForest.scala . It have strategy and featureSubSetStrategy as separate parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    LGTM, merged to master. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119364985
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -420,18 +394,18 @@ private[ml] trait RandomForestParams extends TreeEnsembleParams {
        */
       final val featureSubsetStrategy: Param[String] = new Param[String](this, "featureSubsetStrategy",
         "The number of features to consider for splits at each tree node." +
    -      s" Supported options: ${RandomForestParams.supportedFeatureSubsetStrategies.mkString(", ")}" +
    +      s" Supported options: ${TreeEnsembleParams.supportedFeatureSubsetStrategies.mkString(", ")}" +
           s", (0.0-1.0], [1-n].",
         (value: String) =>
    -      RandomForestParams.supportedFeatureSubsetStrategies.contains(
    +      TreeEnsembleParams.supportedFeatureSubsetStrategies.contains(
             value.toLowerCase(Locale.ROOT))
    -      || Try(value.toInt).filter(_ > 0).isSuccess
    -      || Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess)
    +        || Try(value.toInt).filter(_ > 0).isSuccess
    +        || Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess)
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77504/testReport)** for PR 18118 at commit [`16ccbdf`](https://github.com/apache/spark/commit/16ccbdfd8862c528c90fdde94c8ec20d6631126e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r125427134
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,47 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    +      .setImpurity("Gini")
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    +    assert(importances.toArray.forall(_ >= 0.0))
    +
    +    val gbtWithFeatureSubset = new GBTClassifier()
    --- End diff --
    
    Not necessary to have an entirely new estimator here - you can just use ` val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy(...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #78417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78417/testReport)** for PR 18118 at commit [`13ce412`](https://github.com/apache/spark/commit/13ce412ce47ea21b8850c47e361ad872425bfb9a).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    ping @mpjlu . Please review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78384/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79150/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148197272
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala ---
    @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S
     
       /** (private[ml]) Train a decision tree on an RDD */
       private[ml] def train(data: RDD[LabeledPoint],
    -      oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
    +      oldStrategy: OldStrategy, featureSubsetStrategy: String): DecisionTreeRegressionModel = {
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119365001
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -441,12 +415,44 @@ private[ml] trait RandomForestParams extends TreeEnsembleParams {
       final def getFeatureSubsetStrategy: String = $(featureSubsetStrategy).toLowerCase(Locale.ROOT)
     }
     
    -private[spark] object RandomForestParams {
    -  // These options should be lowercase.
    -  final val supportedFeatureSubsetStrategies: Array[String] =
    -    Array("auto", "all", "onethird", "sqrt", "log2").map(_.toLowerCase(Locale.ROOT))
    +
    +
    +/**
    + * Parameters for Random Forest algorithms.
    + */
    +private[ml] trait RandomForestParams extends TreeEnsembleParams {
    +
    +  /**
    +   * Number of trees to train (>= 1).
    +   * If 1, then no bootstrapping is used.  If > 1, then bootstrapping is done.
    +   * TODO: Change to always do bootstrapping (simpler).  SPARK-7130
    +   * (default = 20)
    +   *
    +   * Note: The reason that we cannot add this to both GBT and RF (i.e. in TreeEnsembleParams)
    +   * is the param `maxIter` controls how many trees a GBT has. The semantics in the algorithms
    +   * are a bit different.
    +   * @group param
    +   */
    +  final val numTrees: IntParam = new IntParam(this, "numTrees", "Number of trees to train (>= 1)",
    +    ParamValidators.gtEq(1))
    +
    +  setDefault(numTrees -> 20)
    +
    +  /**
    +   * @deprecated This method is deprecated and will be removed in 3.0.0.
    +   * @group setParam
    +   */
    +  @deprecated("This method is deprecated and will be removed in 3.0.0.", "2.1.0")
    +  def setNumTrees(value: Int): this.type = set(numTrees, value)
    +
    +  /** @group getParam */
    +  final def getNumTrees: Int = $(numTrees)
    +
    +
     }
     
    +
    +
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah please find some time to look into the changes .  
    
    Please  let me know if further changes are required.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Jenkins test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Can one of the admins please verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r149760874
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
    @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTRegressor()
    +      .setMaxDepth(3)
    +      .setMaxIter(5)
    +      .setSubsamplingRate(1.0)
    +      .setStepSize(0.5)
    +      .setSeed(123)
    +      .setFeatureSubsetStrategy("all")
    +
    +    // In this data, feature 1 is very important.
    +    val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
    +    val categoricalFeatures = Map.empty[Int, Int]
    +    val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses)
    +
    +    val importances = gbt.fit(df).featureImportances
    +    val mostImportantFeature = importances.argmax
    +    assert(mostImportantFeature === 1)
    +    assert(importances.toArray.sum === 1.0)
    --- End diff --
    
    You've kept the other assertions in this test (that were removed from the classifier test as per https://github.com/apache/spark/pull/18118#discussion_r148095940). If they're not necessary we should also remove them here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83627/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r149758719
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
    @@ -173,6 +178,10 @@ object GBTRegressor extends DefaultParamsReadable[GBTRegressor] {
     
       @Since("2.0.0")
       override def load(path: String): GBTRegressor = super.load(path)
    +
    +  @Since("2.3.0")
    --- End diff --
    
    You removed this from the classifier but not here (as per https://github.com/apache/spark/pull/18118#discussion_r148063169)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah Have done the changes as suggested ,but build is failing because of this error
    Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?
    
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83354/
    
    Please help on the same
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    can any one of admins review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123049135
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -136,6 +136,10 @@ class GBTClassifier @Since("1.4.0") (
       @Since("1.4.0")
       override def setStepSize(value: Double): this.type = set(stepSize, value)
     
    +  /** @group setParam */
    +  override def setFeatureSubsetStrategy(value: String): this.type =
    --- End diff --
    
    add since tag


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r119364950
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -305,7 +305,7 @@ private[ml] object TreeRegressorParams {
     }
     
     private[ml] trait DecisionTreeRegressorParams extends DecisionTreeParams
    -  with TreeRegressorParams with HasVarianceCol {
    +  with TreeRegressorParams with HasVarianceCol  {
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77588/testReport)** for PR 18118 at commit [`12d83aa`](https://github.com/apache/spark/commit/12d83aaa5f24e17304de760c5664b8ede55d678a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Have done the changes suggested by @mpjlu  . 
    
    Please find some time to review the pull request . 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83372/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148197248
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -192,6 +197,10 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] {
     
       @Since("2.0.0")
       override def load(path: String): GBTClassifier = super.load(path)
    +
    +  @Since("2.3.0")
    +  final val supportedFeatureSubsetStrategies: Array[String] =
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77501/testReport)** for PR 18118 at commit [`9b3d03b`](https://github.com/apache/spark/commit/9b3d03ba99e69176e94b518a3f8cd2ac9e55d10e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148063169
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -192,6 +197,10 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] {
     
       @Since("2.0.0")
       override def load(path: String): GBTClassifier = super.load(path)
    +
    +  @Since("2.3.0")
    +  final val supportedFeatureSubsetStrategies: Array[String] =
    --- End diff --
    
    I know this exists for RandomForest but I don't see a need for it. The supported options can be obtained through the param doc. I prefer to leave it out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r123049728
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
    @@ -192,6 +196,9 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] {
     
       @Since("2.0.0")
       override def load(path: String): GBTClassifier = super.load(path)
    +
    +  final val supportedFeatureSubsetStrategies: Array[String] =
    --- End diff --
    
    public vals and methods require since tags. Also, this wasn't added to `GBTRegressor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    [~arushkharbanda][~peng.meng@intel.com][~facai] [~srowen]
    
    Please review the pull request /approach,.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #78384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78384/testReport)** for PR 18118 at commit [`d0bb8bc`](https://github.com/apache/spark/commit/d0bb8bc43e8080492a49505b7a701bca165b03c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77578/testReport)** for PR 18118 at commit [`f9aa30c`](https://github.com/apache/spark/commit/f9aa30c175ab143418b54c1910783d6af94b6662).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

Posted by pralabhkumar <gi...@git.apache.org>.
Github user pralabhkumar commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    @sethah Its still failing , I don't think so its issue from my side.  Please help


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18118
  
    **[Test build #77503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77503/testReport)** for PR 18118 at commit [`426bc68`](https://github.com/apache/spark/commit/426bc68e4b75fba76d993e0da94d28982b449c72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18118#discussion_r148564380
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
       }
     
       /////////////////////////////////////////////////////////////////////////////
    +  // Tests of feature subset strategy
    +  /////////////////////////////////////////////////////////////////////////////
    +  test("Tests of feature subset strategy") {
    +    val numClasses = 2
    +    val gbt = new GBTClassifier()
    --- End diff --
    
    Setting maxIter and maxDepth to some small value is actually useful so the tests don't take unnecessarily long.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org