You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2017/02/07 19:24:41 UTC

[jira] [Commented] (SPARK-17139) Add model summary for MultinomialLogisticRegression

    [ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856612#comment-15856612 ] 

Joseph K. Bradley commented on SPARK-17139:
-------------------------------------------

I'll offer a few thoughts first:
* A "ClassificationSummary" could be the same as a "MulticlassClassificationSummary" because binary is a special type of multiclass.
* Following the structure of abstractions for Prediction is reasonable.
* Separating binary and multiclass is reasonable; the separation is more significant for evaluation than for the Prediction abstractions.
* Abstract classes have been a pain in the case of Prediction abstractions, so I'd prefer we use traits.

The 2 alternatives I see are:
1. BinaryClassificationSummary inherits from ClassificationSummary.  No separate MulticlassClassificationSummary.
2. BinaryClassificationSummary and MulticlassClassificationSummary inherit from ClassificationSummary.

Both alternatives are semantically reasonable.  However, since ClassificationSummary = MulticlassClassificationSummary in terms of functionality, and since the Prediction abstractions combine binary and multiclass, I prefer option 1.

If we go with option 1, then we need 4 concrete classes:
* LogisticRegressionSummary
* LogisticRegressionTrainingSummary
* BinaryLogisticRegressionSummary
* BinaryLogisticRegressionTrainingSummary

We would definitely want binary summaries to inherit from their multiclass counterparts, and for training summaries to inherit from their general counterparts:
* LogisticRegressionSummary
* LogisticRegressionTrainingSummary: LogisticRegressionSummary
* BinaryLogisticRegressionSummary: LogisticRegressionSummary
* BinaryLogisticRegressionTrainingSummary: LogisticRegressionTrainingSummary, BinaryLogisticRegressionSummary

Of course, this is a problem.  But we could solve it by having all of these be traits, with concrete classes inheriting.  I.e., {{LogisticRegressionModel.summary}} could return {{trait LogisticRegressionTrainingSummary}}, which could be of concrete type {{LogisticRegressionTrainingSummaryImpl}} (multiclass) or {{BinaryLogisticRegressionTrainingSummaryImpl}} (binary).

I suspect MiMa will complain about this, but IIRC it's safe since all of these summaries have private constructors and can't be extended outside of Spark.

What do you think?

> Add model summary for MultinomialLogisticRegression
> ---------------------------------------------------
>
>                 Key: SPARK-17139
>                 URL: https://issues.apache.org/jira/browse/SPARK-17139
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Seth Hendrickson
>
> Add model summary to multinomial logistic regression using same interface as in other ML models.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org