You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Teng Peng (JIRA)" <ji...@apache.org> on 2017/11/03 02:28:00 UTC
[jira] [Created] (SPARK-22433) Linear regression R^2 train/test
terminology related
Teng Peng created SPARK-22433:
---------------------------------
Summary: Linear regression R^2 train/test terminology related
Key: SPARK-22433
URL: https://issues.apache.org/jira/browse/SPARK-22433
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.2.0
Reporter: Teng Peng
Priority: Minor
Traditional statistics is traditional statistics. Their goal, framework, and terminologies are not the same as ML. However, in linear regression related components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator :
* R2 shouldn't be there.
* A better name "regressionPredictionMetric".
2. LinearregRessionSuite:
* Shouldn't test R2 and residuals on test data.
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear regression is linear. Adding a penalty term, then it is no longer linear. Just call it "LASSO", "ElasticNet".
There are more. I am working on correcting them.
They are not breaking anything, but it does not make one feel good to see the basic distinction is blurred.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org