You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/26 14:09:05 UTC
[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library

    [ https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602781#comment-14602781 ] 

ASF GitHub Bot commented on FLINK-2157:
---------------------------------------

GitHub user thvasilo opened a pull request:

    https://github.com/apache/flink/pull/871

    [FLINK-2157] [ml] [WIP]  Create evaluation framework for ML library

    WIP PR for the model evaluation framework for FlinkML.
    
    The evaluation follow sklearn's paradigm, where a Scorer object is created with a performance score (sklearn's metrics), and provides an evaluate function that takes a trained model and a test dataset and produces a score.
    
    The performance scores and Scorer are implemented in the flink.ml.evaluation package.
    Currently we have squared loss, zero-one loss, accuracy score for classification and R^2 score for 
    regression.
    
    Finally a score function has been added to regression algorithms (and will be added to classifiers as well) that provides an intuitive way to evaluate the performance of an algorithm without the need to create a Scorer, as per [FLINK-2108](https://issues.apache.org/jira/browse/FLINK-2108).
    
    The PR currently includes some work from Mikio Braun for a linear regression solver, but that will be moved to a separate PR.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thvasilo/flink evaluation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #871
    
----
commit ac373fb4af39d288c5b61bf1c86b1de5556748a6
Author: Till Rohrmann <tr...@apache.org>
Date:   2015-06-02T12:34:27Z

    [FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation which can be reused by evaluate if the input data is of the format (TestingType, LabelType) where the second tuple field represents the true label.

commit 7133cafb643d545fa5c66bedc7d5eda847faeb62
Author: mikiobraun <mi...@gmail.com>
Date:   2015-06-09T11:25:34Z

    First working version of a simpler least squares implementation
    
    Not done any work integrating that with the Flink Pipeline stuff

commit f5315c0ce59b6a32c8aeb81ebba2a5982e981835
Author: mikiobraun <mi...@gmail.com>
Date:   2015-06-10T08:49:55Z

    reduce amount of toString computations for large collections

commit 74aafa00e7e61003e081f9b54697ee9904487544
Author: mikiobraun <mi...@gmail.com>
Date:   2015-06-12T15:18:39Z

    simple lsr into pipeline

commit f5c498ba1ba58a51f265f922fdce312518fcbf68
Author: mikiobraun <mi...@gmail.com>
Date:   2015-06-19T11:23:53Z

    working on the Simple LSR tests

commit f37c41fc1d0b959c60c3e06f7d4633b57a7b87ac
Author: mikiobraun <mi...@gmail.com>
Date:   2015-06-19T14:32:54Z

    slightly better checks in the SimpleLeastSquaresRegressionTest

commit aae27c2f25792143febb900a11f4980ca1159aae
Author: mikiobraun <mi...@gmail.com>
Date:   2015-06-22T15:04:42Z

    Adding some first loss functions for the evaluation framework

commit 4d115f7db3e569655e2fb156f18ec897cd573089
Author: Theodore Vasiloudis <tv...@sics.se>
Date:   2015-06-23T14:07:48Z

    Scorer for evaluation

commit 1e7309d7ba2519e2520ed816456cfa2ca8e92510
Author: Theodore Vasiloudis <tv...@sics.se>
Date:   2015-06-25T09:41:10Z

    Adds accuracy score and R^2 score. Also trying out Scores as classes instead of functions.
    
    Not too happy with the extra biolerplate of Score as classes will probably revert,
    and have objects like RegressionsScores, ClassificationScores that contain the definitions
    of the relevant scores.

commit 3e275d567e2c4fe0b72875cfb54645dd346b4e22
Author: Theodore Vasiloudis <tv...@sics.se>
Date:   2015-06-26T11:30:56Z

    Adds a evaluate operation for LabeledVector input

commit 8c194be4a39170cb7f4865ae1dd39ebbeeddef7e
Author: Theodore Vasiloudis <tv...@sics.se>
Date:   2015-06-26T11:32:13Z

    Adds Regressor interface, and a score function for regression algorithms.

----


> Create evaluation framework for ML library
> ------------------------------------------
>
>                 Key: FLINK-2157
>                 URL: https://issues.apache.org/jira/browse/FLINK-2157
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>              Labels: ML
>             Fix For: 0.10
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. It would be great to add some {{Evaluators}} which can calculate some score based on the information about true and predicted labels. This could also be used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)