You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/26 14:09:05 UTC
[jira] [Commented] (FLINK-2157) Create evaluation framework for ML
library
[ https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602781#comment-14602781 ]
ASF GitHub Bot commented on FLINK-2157:
---------------------------------------
GitHub user thvasilo opened a pull request:
https://github.com/apache/flink/pull/871
[FLINK-2157] [ml] [WIP] Create evaluation framework for ML library
WIP PR for the model evaluation framework for FlinkML.
The evaluation follow sklearn's paradigm, where a Scorer object is created with a performance score (sklearn's metrics), and provides an evaluate function that takes a trained model and a test dataset and produces a score.
The performance scores and Scorer are implemented in the flink.ml.evaluation package.
Currently we have squared loss, zero-one loss, accuracy score for classification and R^2 score for
regression.
Finally a score function has been added to regression algorithms (and will be added to classifiers as well) that provides an intuitive way to evaluate the performance of an algorithm without the need to create a Scorer, as per [FLINK-2108](https://issues.apache.org/jira/browse/FLINK-2108).
The PR currently includes some work from Mikio Braun for a linear regression solver, but that will be moved to a separate PR.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/thvasilo/flink evaluation
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/871.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #871
----
commit ac373fb4af39d288c5b61bf1c86b1de5556748a6
Author: Till Rohrmann <tr...@apache.org>
Date: 2015-06-02T12:34:27Z
[FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation which can be reused by evaluate if the input data is of the format (TestingType, LabelType) where the second tuple field represents the true label.
commit 7133cafb643d545fa5c66bedc7d5eda847faeb62
Author: mikiobraun <mi...@gmail.com>
Date: 2015-06-09T11:25:34Z
First working version of a simpler least squares implementation
Not done any work integrating that with the Flink Pipeline stuff
commit f5315c0ce59b6a32c8aeb81ebba2a5982e981835
Author: mikiobraun <mi...@gmail.com>
Date: 2015-06-10T08:49:55Z
reduce amount of toString computations for large collections
commit 74aafa00e7e61003e081f9b54697ee9904487544
Author: mikiobraun <mi...@gmail.com>
Date: 2015-06-12T15:18:39Z
simple lsr into pipeline
commit f5c498ba1ba58a51f265f922fdce312518fcbf68
Author: mikiobraun <mi...@gmail.com>
Date: 2015-06-19T11:23:53Z
working on the Simple LSR tests
commit f37c41fc1d0b959c60c3e06f7d4633b57a7b87ac
Author: mikiobraun <mi...@gmail.com>
Date: 2015-06-19T14:32:54Z
slightly better checks in the SimpleLeastSquaresRegressionTest
commit aae27c2f25792143febb900a11f4980ca1159aae
Author: mikiobraun <mi...@gmail.com>
Date: 2015-06-22T15:04:42Z
Adding some first loss functions for the evaluation framework
commit 4d115f7db3e569655e2fb156f18ec897cd573089
Author: Theodore Vasiloudis <tv...@sics.se>
Date: 2015-06-23T14:07:48Z
Scorer for evaluation
commit 1e7309d7ba2519e2520ed816456cfa2ca8e92510
Author: Theodore Vasiloudis <tv...@sics.se>
Date: 2015-06-25T09:41:10Z
Adds accuracy score and R^2 score. Also trying out Scores as classes instead of functions.
Not too happy with the extra biolerplate of Score as classes will probably revert,
and have objects like RegressionsScores, ClassificationScores that contain the definitions
of the relevant scores.
commit 3e275d567e2c4fe0b72875cfb54645dd346b4e22
Author: Theodore Vasiloudis <tv...@sics.se>
Date: 2015-06-26T11:30:56Z
Adds a evaluate operation for LabeledVector input
commit 8c194be4a39170cb7f4865ae1dd39ebbeeddef7e
Author: Theodore Vasiloudis <tv...@sics.se>
Date: 2015-06-26T11:32:13Z
Adds Regressor interface, and a score function for regression algorithms.
----
> Create evaluation framework for ML library
> ------------------------------------------
>
> Key: FLINK-2157
> URL: https://issues.apache.org/jira/browse/FLINK-2157
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Labels: ML
> Fix For: 0.10
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. It would be great to add some {{Evaluators}} which can calculate some score based on the information about true and predicted labels. This could also be used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)