You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Scott Imig <si...@richrelevance.com> on 2016/01/15 17:06:28 UTC

Feature importance for RandomForestRegressor in Spark 1.5

Hello,

I have a couple of quick questions about this pull request, which adds feature importance calculations to the random forests in MLLib.

https://github.com/apache/spark/pull/7838

1. Can someone help me determine the Spark version where this is first available?  (1.5.0?  1.5.1?)

2. Following the templates in this  documentation to construct a RandomForestModel, should I be able to retrieve model.featureImportances?  Or is there a different pattern for random forests in more recent spark versions?

https://spark.apache.org/docs/1.2.0/mllib-ensembles.html

Thanks for the help!
Imig
--
S. Imig | Senior Data Scientist Engineer | richrelevance |m: 425.999.5725

I support Bip 101 and BitcoinXT<https://bitcoinxt.software/>.

Re: Feature importance for RandomForestRegressor in Spark 1.5

Posted by Yanbo Liang <yb...@gmail.com>.
Hi Robin,

#1 This feature is available from Spark 1.5.0.
#2 You should use the new ML rather than the old MLlib package to train the
Random Forest model and get featureImportances, because it was only exposed
at ML package. You can refer the documents:
https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier
.

Thanks
Yanbo

2016-01-16 0:16 GMT+08:00 Robin East <ro...@xense.co.uk>:

> re 1.
> The pull requests reference the JIRA ticket in this case
> https://issues.apache.org/jira/browse/SPARK-5133. The JIRA says it was
> released in 1.5.
>
>
>
> -------------------------------------------------------------------------------
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 15 Jan 2016, at 16:06, Scott Imig <si...@richrelevance.com> wrote:
>
> Hello,
>
> I have a couple of quick questions about this pull request, which adds
> feature importance calculations to the random forests in MLLib.
>
> https://github.com/apache/spark/pull/7838
>
> 1. Can someone help me determine the Spark version where this is first
> available?  (1.5.0?  1.5.1?)
>
> 2. Following the templates in this  documentation to construct a
> RandomForestModel, should I be able to retrieve model.featureImportances?
> Or is there a different pattern for random forests in more recent spark
> versions?
>
> https://spark.apache.org/docs/1.2.0/mllib-ensembles.html
>
> Thanks for the help!
> Imig
> --
> S. Imig | Senior Data Scientist Engineer | *rich**relevance *|m:
> 425.999.5725
>
> I support Bip 101 and BitcoinXT <https://bitcoinxt.software/>.
>
>
>

Re: Feature importance for RandomForestRegressor in Spark 1.5

Posted by Robin East <ro...@xense.co.uk>.
re 1.
The pull requests reference the JIRA ticket in this case https://issues.apache.org/jira/browse/SPARK-5133 <https://issues.apache.org/jira/browse/SPARK-5133>. The JIRA says it was released in 1.5.


-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>





> On 15 Jan 2016, at 16:06, Scott Imig <si...@richrelevance.com> wrote:
> 
> Hello,
> 
> I have a couple of quick questions about this pull request, which adds feature importance calculations to the random forests in MLLib.
> 
> https://github.com/apache/spark/pull/7838 <https://github.com/apache/spark/pull/7838>
> 
> 1. Can someone help me determine the Spark version where this is first available?  (1.5.0?  1.5.1?)
> 
> 2. Following the templates in this  documentation to construct a RandomForestModel, should I be able to retrieve model.featureImportances?  Or is there a different pattern for random forests in more recent spark versions?
> 
> https://spark.apache.org/docs/1.2.0/mllib-ensembles.html <https://spark.apache.org/docs/1.2.0/mllib-ensembles.html>
> 
> Thanks for the help!
> Imig
> -- 
> S. Imig | Senior Data Scientist Engineer | richrelevance |m: 425.999.5725
> 
> I support Bip 101 and BitcoinXT <https://bitcoinxt.software/>.