You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2017/10/31 18:37:00 UTC

[jira] [Created] (MADLIB-1169) Change how cross validation stats are reported and improve user docs

Frank McQuillan created MADLIB-1169:
---------------------------------------

             Summary: Change how cross validation stats are reported and improve user docs 
                 Key: MADLIB-1169
                 URL: https://issues.apache.org/jira/browse/MADLIB-1169
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Module: Regularized Regression
            Reporter: Frank McQuillan
             Fix For: v1.13


Context

Currently in cross validation, e.g.. for elastic net
http://madlib.apache.org/docs/latest/group__grp__elasticnet.html
CV stats are reported like:

alpha | lambda_value |        mean         |     std
------+--------------+---------------------+--------------------
    0 |       100000 | -1.41777698585e+110 | 1.80536123195e+110
  0.1 |       100000 | -1.19953054719e+107 | 1.72846143163e+107
    1 |       100000 |      -4175743937.91 |      2485189261.38
etc.

Here the "mean" col is the negative of the loss, which is a sort of accuracy, so the col header is not explanatory.

Story

As a MADlib developer, I want to make it clear what CV reported stats are, so that users are not confused as to what they mean.

Acceptance

1) Change the calculation to report rmse from mse so that it is a smaller number that relates to the magnitude of the data.  We can still report as negative, but just use rmse.

2) Rename the columns as "mean_neg_loss: and "std_neg_loss"

3) Improve the user docs to explain the col means for regression and classification.

4) Update any IC or functional/Tinc tests that are affected.

5) Update the example in 
http://madlib.apache.org/docs/latest/group__grp__elasticnet.html
and also find any other modules where that need to be updated (SVM?)

6) Check the log_likelihood value reported in EN.  Is it really a log likelihood and is it reported correctly?












--
This message was sent by Atlassian JIRA
(v6.4.14#64029)