You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Yuriy Babak <y....@gmail.com> on 2018/12/13 10:26:18 UTC

[ML] Metric calculation for classification models

Igniters, Alexey

I want to discuss the ticket 10371 [1], currently, we calculate 4 numbers
(true positive, true negative, false positive, false negative) for each
"point metric" like accuracy, recall, f-score and precision for each label.

So for the full score we need calculates those 4 numbers 8 times. But we
could calculate all 8 metrics(4 for the first label and 4 for the second
label).

I suggest introducing new API "point metric" for metrics like those
4(accuracy, recall, f-score, and precision) and "integral metric" for
metrics like ROC AUC [2].

Any thoughts would be appreciated.

[1] - https://issues.apache.org/jira/browse/IGNITE-10371
[2] - https://issues.apache.org/jira/browse/IGNITE-10145

Re: [ML] Metric calculation for classification models

Posted by Alexey Zinoviev <za...@gmail.com>.

Please, have a look at new version in my PR where I've implemented the
approach that was listed above
https://github.com/apache/ignite/pull/5612



чт, 13 дек. 2018 г. в 17:21, Dmitriy Pavlov <dp...@apache.org>:

> Folks, I sometimes hear complains related to metrics and its clearness for
> end-users.
>
> Would you add a couple of words related to each value to wiki/readme.io?
>
> чт, 13 дек. 2018 г. в 17:13, Alexey Zinoviev <za...@gmail.com>:
>
> > So, I agree that we should avoid ineffective metrics calculations.
> > I think that in 2.8 release we should have
> >
> >    1. BinaryClassificationMetric with all metrics from Wikipedia
> >    2. Metric interface with 1 or two implementations in example folder or
> >    in metric package like roc auc and accuracy
> >    3. BinaryClassificationMetric and MultiClassClassificationMetrics
> should
> >    implement new interface MetricGroup
> >
> > Will totally change the current PR according your recommendation
> >
> > чт, 13 дек. 2018 г. в 16:06, Алексей Платонов <ap...@gmail.com>:
> >
> > > You can compute just TP (true-positive), FP, TN and FN counters and use
> > > them to evaluate Recall, Precision, Accuracy, ect. If you want to
> specify
> > > class for Pr evaluation, then you can compute Pr for first label as
> > > TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we
> > can
> > > unite all one-point metrics evaluation.
> > >
> > > In my opinion we can redesign metrics calculation and provide one-point
> > > metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
> > > metrics can be calculated through TP,FP etc.
> > >
> > > Maybe you should design class BinaryClassificationMetric that computes
> > > these counters and provide methods like recall :: () -> double,
> precision
> > > :: () -> double, etc.
> > >
> > > чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <y....@gmail.com>:
> > >
> > > > Igniters, Alexey
> > > >
> > > > I want to discuss the ticket 10371 [1], currently, we calculate 4
> > numbers
> > > > (true positive, true negative, false positive, false negative) for
> each
> > > > "point metric" like accuracy, recall, f-score and precision for each
> > > label.
> > > >
> > > > So for the full score we need calculates those 4 numbers 8 times. But
> > we
> > > > could calculate all 8 metrics(4 for the first label and 4 for the
> > second
> > > > label).
> > > >
> > > > I suggest introducing new API "point metric" for metrics like those
> > > > 4(accuracy, recall, f-score, and precision) and "integral metric" for
> > > > metrics like ROC AUC [2].
> > > >
> > > > Any thoughts would be appreciated.
> > > >
> > > > [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> > > > [2] - https://issues.apache.org/jira/browse/IGNITE-10145
> > > >
> > >
> >
>

Re: [ML] Metric calculation for classification models

Posted by Yuriy Babak <y....@gmail.com>.

Dmitriy,

Sure, all changes in ML module will be described on readme.io site with
next release (2.8).

Best regards,
Yuriy Babak


чт, 13 дек. 2018 г. в 17:21, Dmitriy Pavlov <dp...@apache.org>:

> Folks, I sometimes hear complains related to metrics and its clearness for
> end-users.
>
> Would you add a couple of words related to each value to wiki/readme.io?
>
> чт, 13 дек. 2018 г. в 17:13, Alexey Zinoviev <za...@gmail.com>:
>
> > So, I agree that we should avoid ineffective metrics calculations.
> > I think that in 2.8 release we should have
> >
> >    1. BinaryClassificationMetric with all metrics from Wikipedia
> >    2. Metric interface with 1 or two implementations in example folder or
> >    in metric package like roc auc and accuracy
> >    3. BinaryClassificationMetric and MultiClassClassificationMetrics
> should
> >    implement new interface MetricGroup
> >
> > Will totally change the current PR according your recommendation
> >
> > чт, 13 дек. 2018 г. в 16:06, Алексей Платонов <ap...@gmail.com>:
> >
> > > You can compute just TP (true-positive), FP, TN and FN counters and use
> > > them to evaluate Recall, Precision, Accuracy, ect. If you want to
> specify
> > > class for Pr evaluation, then you can compute Pr for first label as
> > > TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we
> > can
> > > unite all one-point metrics evaluation.
> > >
> > > In my opinion we can redesign metrics calculation and provide one-point
> > > metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
> > > metrics can be calculated through TP,FP etc.
> > >
> > > Maybe you should design class BinaryClassificationMetric that computes
> > > these counters and provide methods like recall :: () -> double,
> precision
> > > :: () -> double, etc.
> > >
> > > чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <y....@gmail.com>:
> > >
> > > > Igniters, Alexey
> > > >
> > > > I want to discuss the ticket 10371 [1], currently, we calculate 4
> > numbers
> > > > (true positive, true negative, false positive, false negative) for
> each
> > > > "point metric" like accuracy, recall, f-score and precision for each
> > > label.
> > > >
> > > > So for the full score we need calculates those 4 numbers 8 times. But
> > we
> > > > could calculate all 8 metrics(4 for the first label and 4 for the
> > second
> > > > label).
> > > >
> > > > I suggest introducing new API "point metric" for metrics like those
> > > > 4(accuracy, recall, f-score, and precision) and "integral metric" for
> > > > metrics like ROC AUC [2].
> > > >
> > > > Any thoughts would be appreciated.
> > > >
> > > > [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> > > > [2] - https://issues.apache.org/jira/browse/IGNITE-10145
> > > >
> > >
> >
>

Re: [ML] Metric calculation for classification models

Posted by Dmitriy Pavlov <dp...@apache.org>.

Folks, I sometimes hear complains related to metrics and its clearness for
end-users.

Would you add a couple of words related to each value to wiki/readme.io?

чт, 13 дек. 2018 г. в 17:13, Alexey Zinoviev <za...@gmail.com>:

> So, I agree that we should avoid ineffective metrics calculations.
> I think that in 2.8 release we should have
>
>    1. BinaryClassificationMetric with all metrics from Wikipedia
>    2. Metric interface with 1 or two implementations in example folder or
>    in metric package like roc auc and accuracy
>    3. BinaryClassificationMetric and MultiClassClassificationMetrics should
>    implement new interface MetricGroup
>
> Will totally change the current PR according your recommendation
>
> чт, 13 дек. 2018 г. в 16:06, Алексей Платонов <ap...@gmail.com>:
>
> > You can compute just TP (true-positive), FP, TN and FN counters and use
> > them to evaluate Recall, Precision, Accuracy, ect. If you want to specify
> > class for Pr evaluation, then you can compute Pr for first label as
> > TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we
> can
> > unite all one-point metrics evaluation.
> >
> > In my opinion we can redesign metrics calculation and provide one-point
> > metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
> > metrics can be calculated through TP,FP etc.
> >
> > Maybe you should design class BinaryClassificationMetric that computes
> > these counters and provide methods like recall :: () -> double, precision
> > :: () -> double, etc.
> >
> > чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <y....@gmail.com>:
> >
> > > Igniters, Alexey
> > >
> > > I want to discuss the ticket 10371 [1], currently, we calculate 4
> numbers
> > > (true positive, true negative, false positive, false negative) for each
> > > "point metric" like accuracy, recall, f-score and precision for each
> > label.
> > >
> > > So for the full score we need calculates those 4 numbers 8 times. But
> we
> > > could calculate all 8 metrics(4 for the first label and 4 for the
> second
> > > label).
> > >
> > > I suggest introducing new API "point metric" for metrics like those
> > > 4(accuracy, recall, f-score, and precision) and "integral metric" for
> > > metrics like ROC AUC [2].
> > >
> > > Any thoughts would be appreciated.
> > >
> > > [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> > > [2] - https://issues.apache.org/jira/browse/IGNITE-10145
> > >
> >
>

Re: [ML] Metric calculation for classification models

Posted by Alexey Zinoviev <za...@gmail.com>.

So, I agree that we should avoid ineffective metrics calculations.
I think that in 2.8 release we should have

   1. BinaryClassificationMetric with all metrics from Wikipedia
   2. Metric interface with 1 or two implementations in example folder or
   in metric package like roc auc and accuracy
   3. BinaryClassificationMetric and MultiClassClassificationMetrics should
   implement new interface MetricGroup

Will totally change the current PR according your recommendation

чт, 13 дек. 2018 г. в 16:06, Алексей Платонов <ap...@gmail.com>:

> You can compute just TP (true-positive), FP, TN and FN counters and use
> them to evaluate Recall, Precision, Accuracy, ect. If you want to specify
> class for Pr evaluation, then you can compute Pr for first label as
> TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we can
> unite all one-point metrics evaluation.
>
> In my opinion we can redesign metrics calculation and provide one-point
> metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
> metrics can be calculated through TP,FP etc.
>
> Maybe you should design class BinaryClassificationMetric that computes
> these counters and provide methods like recall :: () -> double, precision
> :: () -> double, etc.
>
> чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <y....@gmail.com>:
>
> > Igniters, Alexey
> >
> > I want to discuss the ticket 10371 [1], currently, we calculate 4 numbers
> > (true positive, true negative, false positive, false negative) for each
> > "point metric" like accuracy, recall, f-score and precision for each
> label.
> >
> > So for the full score we need calculates those 4 numbers 8 times. But we
> > could calculate all 8 metrics(4 for the first label and 4 for the second
> > label).
> >
> > I suggest introducing new API "point metric" for metrics like those
> > 4(accuracy, recall, f-score, and precision) and "integral metric" for
> > metrics like ROC AUC [2].
> >
> > Any thoughts would be appreciated.
> >
> > [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> > [2] - https://issues.apache.org/jira/browse/IGNITE-10145
> >
>

Re: [ML] Metric calculation for classification models

Posted by Алексей Платонов <ap...@gmail.com>.

You can compute just TP (true-positive), FP, TN and FN counters and use
them to evaluate Recall, Precision, Accuracy, ect. If you want to specify
class for Pr evaluation, then you can compute Pr for first label as
TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we can
unite all one-point metrics evaluation.

In my opinion we can redesign metrics calculation and provide one-point
metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
metrics can be calculated through TP,FP etc.

Maybe you should design class BinaryClassificationMetric that computes
these counters and provide methods like recall :: () -> double, precision
:: () -> double, etc.

чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <y....@gmail.com>:

> Igniters, Alexey
>
> I want to discuss the ticket 10371 [1], currently, we calculate 4 numbers
> (true positive, true negative, false positive, false negative) for each
> "point metric" like accuracy, recall, f-score and precision for each label.
>
> So for the full score we need calculates those 4 numbers 8 times. But we
> could calculate all 8 metrics(4 for the first label and 4 for the second
> label).
>
> I suggest introducing new API "point metric" for metrics like those
> 4(accuracy, recall, f-score, and precision) and "integral metric" for
> metrics like ROC AUC [2].
>
> Any thoughts would be appreciated.
>
> [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> [2] - https://issues.apache.org/jira/browse/IGNITE-10145
>