You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Teng Peng (JIRA)" <ji...@apache.org> on 2017/11/03 02:28:00 UTC

[jira] [Updated] (SPARK-22433) Linear regression R^2 train/test terminology related

     [ https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Teng Peng updated SPARK-22433:
------------------------------
    Description: 
Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator : 
* R2 shouldn't be there. 
* A better name "regressionPredictionMetric".

2. LinearregRessionSuite: 
* Shouldn't test R2 and residuals on test data. 
* There is no train set and test set in this setting.

3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".

There are more. I am working on correcting them.

They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.

  was:
Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator : 
* R2 shouldn't be there. 
* A better name "regressionPredictionMetric".
2. LinearregRessionSuite: 
* Shouldn't test R2 and residuals on test data. 
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".

There are more. I am working on correcting them.

They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.


> Linear regression R^2 train/test terminology related 
> -----------------------------------------------------
>
>                 Key: SPARK-22433
>                 URL: https://issues.apache.org/jira/browse/SPARK-22433
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Teng Peng
>            Priority: Minor
>
> Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
> 1. regressionMetric + regressionEvaluator : 
> * R2 shouldn't be there. 
> * A better name "regressionPredictionMetric".
> 2. LinearregRessionSuite: 
> * Shouldn't test R2 and residuals on test data. 
> * There is no train set and test set in this setting.
> 3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".
> There are more. I am working on correcting them.
> They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org