You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Teng Peng (JIRA)" <ji...@apache.org> on 2017/11/03 02:28:00 UTC
[jira] [Updated] (SPARK-22433) Linear regression R^2 train/test
terminology related
[ https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Teng Peng updated SPARK-22433:
------------------------------
Description:
Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator :
* R2 shouldn't be there.
* A better name "regressionPredictionMetric".
2. LinearregRessionSuite:
* Shouldn't test R2 and residuals on test data.
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".
There are more. I am working on correcting them.
They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.
was:
Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator :
* R2 shouldn't be there.
* A better name "regressionPredictionMetric".
2. LinearregRessionSuite:
* Shouldn't test R2 and residuals on test data.
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".
There are more. I am working on correcting them.
They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.
> Linear regression R^2 train/test terminology related
> -----------------------------------------------------
>
> Key: SPARK-22433
> URL: https://issues.apache.org/jira/browse/SPARK-22433
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.2.0
> Reporter: Teng Peng
> Priority: Minor
>
> Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
> 1. regressionMetric + regressionEvaluator :
> * R2 shouldn't be there.
> * A better name "regressionPredictionMetric".
> 2. LinearregRessionSuite:
> * Shouldn't test R2 and residuals on test data.
> * There is no train set and test set in this setting.
> 3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".
> There are more. I am working on correcting them.
> They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org