You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by kidynamit <pa...@gmail.com> on 2014/12/11 13:41:09 UTC

Evaluation Metrics for Spark's MLlib

Hi, 

I would like to contribute to Spark's Machine Learning library by adding
evaluation metrics that would be used to gauge the accuracy of a model given
a certain features' set. In particular, I seek to contribute the k-fold
validation metrics, f-beta metric among others on top of the current MLlib
framework available.

Please assist in steps I could take to contribute in this manner. 

Regards, 
kidynamit



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Evaluation-Metrics-for-Spark-s-MLlib-tp9727.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Evaluation Metrics for Spark's MLlib

Posted by Joseph Bradley <jo...@databricks.com>.

Hi, I'd recommend starting by checking out the existing helper
functionality for these tasks.  There are helper methods to do K-fold
cross-validation in MLUtils:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala

The experimental spark.ml API in the Spark 1.2 release (in branch-1.2 and
master) has a CrossValidator class which does this more automatically:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

There are also a few evaluation metrics implemented:
https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation

There definitely could be more metrics and/or better APIs to make it easier
to evaluate models on RDDs.  If you spot such cases, I'd recommend opening
up JIRAs for the new features or improvements to get some feedback before
sending PRs:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

Hope this helps & looking forward to the contributions!
Joseph

On Thu, Dec 11, 2014 at 4:41 AM, kidynamit <pa...@gmail.com> wrote:

> Hi,
>
> I would like to contribute to Spark's Machine Learning library by adding
> evaluation metrics that would be used to gauge the accuracy of a model
> given
> a certain features' set. In particular, I seek to contribute the k-fold
> validation metrics, f-beta metric among others on top of the current MLlib
> framework available.
>
> Please assist in steps I could take to contribute in this manner.
>
> Regards,
> kidynamit
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Evaluation-Metrics-for-Spark-s-MLlib-tp9727.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>