You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Teng Peng (JIRA)" <ji...@apache.org> on 2017/11/03 02:28:00 UTC

[jira] [Created] (SPARK-22433) Linear regression R^2 train/test terminology related

Teng Peng created SPARK-22433:
---------------------------------

             Summary: Linear regression R^2 train/test terminology related 
                 Key: SPARK-22433
                 URL: https://issues.apache.org/jira/browse/SPARK-22433
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.2.0
            Reporter: Teng Peng
            Priority: Minor


Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator : 
* R2 shouldn't be there. 
* A better name "regressionPredictionMetric".
2. LinearregRessionSuite: 
* Shouldn't test R2 and residuals on test data. 
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".

There are more. I am working on correcting them.

They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org