You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/14 01:50:33 UTC

[GitHub] [incubator-mxnet] acphile opened a new issue #18046: Proposal to mxnet.metric

acphile opened a new issue #18046: Proposal to mxnet.metric
URL: https://github.com/apache/incubator-mxnet/issues/18046
 
 
   ## Motivation
   
   mxnet.metric provides different methods for users to judge the performance of models. But currently there are some shortcomings which need to be improved in mxnet.metric. We propose to refactor the metrics interface to fix all issues and place the new interface under mx.gluon.metrics.
   
   In general, we want to make the following improvements:
   
   1. Moving the API to the gluon namespace
   2. Make the API more user-friendly and pythonic
   3. Structure the API to make hybridization of the complete training loop more easily feasible in the future.
   
   ### 1. Inconsistency in computational granularity of metrics
   
   Currently there are two computational granularities in mxnet.metric:
   
   1. “macro” level: calculate average performance per batch , like implementation in [MAE](http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#mxnet.metric.MAE)
   2. “micro” level: calculate average performance per sample, like implementation in [Accuracy](http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#mxnet.metric.Accuracy), [CrossEntropy](http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#mxnet.metric.CrossEntropy)
   
   Generally, “micro” level is more useful because usually we focus on average performance of data samples in the test set rather than that of testing batches. So here we need to make arrangements between these metrics.
   
   ### 2. For future hybridization of the complete training loop
   
   Currently metrics in mxnet.metric receives “list of NDArray” and calculate results by numpy. In fact, many metrics’ computation could be implemented in nn.HybridBlock. Using HybridBlock.hybridize(), the computation could be done in the backend, which could be faster. By refactoring the mxnet.metric, we could one day compile the model with the metric like Tensorflow and do the complete training loop including evaluation fully in the backend. Thus our new API design takes into account the hybridization use-case, so that hybridizing the complete training loop will be easily possible once the backend support is there.
   
   ### 3. lacking some useful metrics
   
   Although many metrics are already included, some still need to be implemented.
   
   Apart from the metrics already provided in mxnet.metric: http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#module-mxnet.metric ,  we plan to add the following metrics:
   
   1. F-beta score: (1+beta^2)*precision*recall/(beta^2*precision+recall)
   2. binary accuracy with threshold: using a confidence threshold to judge whether the example is positive or negative
   3. MeanCosineSimilarity: return the average cosin similarity between predictions and ground truth
   4. MeanPairwiseDistance:  return the average pairwise distance between predictions and ground truth
   
   ### 4. Fixing issues in the existing metrics
   
   Some special cases and input shapes need to be examined and fixed.
   About EvalMetric (base class in metrics.py)
   
   1. distinction between local and global:
       a. Currently for metrics in metric.py, when update() is called, both local accumulator and global accumulator are updated with the same value. 
       b. Global accumulator may be useful when there are different parts during evaluation (for example, joint training on different datasets). You may want to get evaluation result of one part and call “reset_local()” to continue the evaluation for next part. In the end, you can call “get_global()” to obtain the overall evaluation performance.
       c. You may also define the way to update local and global results in your own metric(EvalMetric)
   2. parameter “output_names” “label_names” and method “update_dict”
       a. Seemingly I only find “update_dict” in “https://github.com/apache/incubator-mxnet/blob/48e9e2c6a1544843ba860124f4eaa8e7bac6100b/python/mxnet/module/executor_group.py”, where I think using “update” is also reasonable.
       b. I don’t know where the corresponding parameter "output_names","label_names" could be used, since there are not corresponding examples.
   3. get_name_value()
       a. return metric’s name and metric’s evalutaion value pairs.
       b. It is helpful when using CompositeEvalMetric
   
   Here are the detailed changes to be made:
   
   1. improve Class MAE (and MSE, RMSE)
       a. including parameter “average”, default average=“macro”
             i. “macro” represents average per batch
             ii. “micro” represents average per example
       b. including micro level calculation:
   2. improve Class _BinaryClassification
       a. support the situation len(pred.shape)==1 
             i. for binary classification, we only need to output a confidence score of being positive, like: pred=[0.1,0.3,0.7] or like pred=[[0.1],[0.3],[0.7]]
       b. including parameter “threshold”, default: threshold=0.5
             i. sometimes we may need to define a threshold that when confidence(positive) > threshold, we classify it as positive, otherwise negative
       c. including parameter “beta” default: beta=1
             i. updating “fscore” calculation with F-beta= (1+beta^2)*precision*recall/(beta^2*precision+recall), which is more general
       d. including method binary_accuracy:
             i. calculation: (true_positives+true_negatives)/total_examples
   3. improve Class TopKAccuracy
       a. Line 578-579: self.global_sum_metric should be accumulated
   4. add Class MeanCosineSimilarity(axis=-1, eps=1e-12)
   5. add Class MeanPairwiseDistance(p=2)
   
   ## Comparisons with other framework
   
   ### Compared with Pytorch Ignite
   
   Reference: https://pytorch.org/ignite/metrics.html
   Base class for metrics is implemented independently. Metrics in ignite.metrics use .attach() method to use the output of the engine’s process_function. It is done by letting the engine to add_event_handler. 
   Metric arithmetics are supported, which is like mxnet.metrics.CustomMetric
   Some metrics currently are not included in ours:
   
   1. [ConfusionMatrix](https://pytorch.org/ignite/metrics.html#ignite.metrics.ConfusionMatrix)
   2. [DiceCoefficient()](https://pytorch.org/ignite/metrics.html#ignite.metrics.DiceCoefficient)
   3. [IoU()](https://pytorch.org/ignite/metrics.html#ignite.metrics.IoU)
   4. [mIoU()](https://pytorch.org/ignite/metrics.html#ignite.metrics.mIoU)
   5. [MeanPairwiseDistance](https://pytorch.org/ignite/metrics.html#ignite.metrics.MeanPairwiseDistance)
   
   ### Compared with Tensorflow Keras 
   
   Reference: https://tensorflow.google.cn/api_docs/python/tf/keras/metrics?hl=en
   Base class for metrics inherits from tf.keras.engine.base_layer.Layer, which is also the class from which all layers inherit. Metric functions in tf.keras.metrics could be supplied in the metrics parameter when a model is compiled. 
   Generally, metric functions in tf.keras.metrics have an input *sample_weight* defining contributing weights when updating the states.
   tf.keras.metrics use [Accuracy](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Accuracy)and [SparseCategoricalAccuracy](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/SparseCategoricalAccuracy)to denote the situation that y_pred is predicted label and the situation that y_pred is probability distribution, which I think may be to avoid internal shape checking. Currently we could combine them in one metric.
   Some metrics currently are not included in ours:
   
   1. [AUC](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/AUC)
   2. [BinaryAccuracy](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/BinaryAccuracy)
   3. Hinge related, like [SquaredHinge](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/SquaredHinge)  [Hinge](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Hinge) [CategoricalHinge](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/CategoricalHinge)
   4. [CosineSimilarity](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/CosineSimilarity)
   5. [KLDivergence](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/KLDivergence)
   6. [LogCoshError](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/LogCoshError) :logcosh = log((exp(x) + exp(-x))/2), where x is the error (y_pred - y_true)
   7. [MeanIoU](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/MeanIoU)
   8. [Poisson](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Poisson)
   9. [SensitivityAtSpecificity](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/SensitivityAtSpecificity)
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] sxjscience commented on issue #18046: Proposal to mxnet.metric

Posted by GitBox <gi...@apache.org>.

sxjscience commented on issue #18046: Proposal to mxnet.metric
URL: https://github.com/apache/incubator-mxnet/issues/18046#issuecomment-614734035
 
 
   Also, I suggest to remove the option of `macro` averaging. I don't think the current implementation is correct. In scikit-learn, there is no `macro` option for MAE (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error), MSE (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error). And for F1 score, the `macro` option is used for multi-label/multi-class prediction. See also: https://github.com/apache/incubator-mxnet/issues/9586#issuecomment-365427676

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] sxjscience commented on issue #18046: Proposal to mxnet.metric

Posted by GitBox <gi...@apache.org>.

sxjscience commented on issue #18046: Proposal to mxnet.metric
URL: https://github.com/apache/incubator-mxnet/issues/18046#issuecomment-613697594
 
 
   I think we can also borrow ideas from the design in AllenNLP: https://github.com/allenai/allennlp/tree/master/allennlp/training/metrics

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #18046: Proposal to mxnet.metric

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #18046:
URL: https://github.com/apache/incubator-mxnet/issues/18046#issuecomment-634985132


   Closed by https://github.com/apache/incubator-mxnet/pull/18083


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] acphile commented on issue #18046: Proposal to mxnet.metric

Posted by GitBox <gi...@apache.org>.

acphile commented on issue #18046:
URL: https://github.com/apache/incubator-mxnet/issues/18046#issuecomment-619826986


   Here are the updated changes to be made:
   ### 1. improve Class MAE, MSE, RMSE
   a. **UPD: remove “macro” supports which represents average per batch**
   b. Rewrite RMSE to inherit from MSE
   ### 2. improve Class _BinaryClassification
   a. **UPD: including parameter “class_type” in [‘binary’, ‘multiclass’, ‘multilabel’]**
   b. support the situation len(pred.shape)==1 for class_type='binary'
   &nbsp;&nbsp;&nbsp;&nbsp; i. for binary classification, we only need to output a confidence score of being positive, like: pred=[0.1,0.3,0.7] or like pred=[[0.1],[0.3],[0.7]]
   c. including parameter “threshold”, default: threshold=0.5
   &nbsp;&nbsp;&nbsp;&nbsp; i. sometimes we may need to define a threshold that when confidence(positive) > threshold, we classify it as positive, otherwise negative
   &nbsp;&nbsp;&nbsp;&nbsp; ii. used when class_type in [‘binary’, ‘multilabel’]
   d. including parameter “beta” default: beta=1
   &nbsp;&nbsp;&nbsp;&nbsp; i. updating “fscore” calculation with F-beta= (1+beta^2)\*precision\*recall/(beta^2\*precision+recall), which is more general
   e. **UPD: add cases for multillabel/multiclass**
   &nbsp;&nbsp;&nbsp;&nbsp; i. including paramater ‘class_type’ in [‘binary’, ‘multilabel’, ‘multiclass’]
   &nbsp;&nbsp;&nbsp;&nbsp; ii. For ‘multilabel’, pred should be (N, ..., C) and label should be (N, ..., C)
   &nbsp;&nbsp;&nbsp;&nbsp; iii. For ‘multiclass’, pred should be (N, ..., C) and label should be (N, ...)
   f. **UPD: replace global_fscore with micro_fscore**
   ### 3. add Class BinaryAccuracy(threshold=0.5)
   ### 4. add Class MeanCosineSimilarity(axis=-1, eps=1e-12)
   ### 5. add Class MeanPairwiseDistance(p=2)
   ### 6. improve Class F1:
   a. F1(class_type="binary", threshold=0.5, average="micro")
   b. **average in [“binary”, “micro”, “macro”]:**
   &nbsp;&nbsp;&nbsp;&nbsp; i. "macro": Calculate metrics for each label and return unweighted mean of f1.
   &nbsp;&nbsp;&nbsp;&nbsp; ii. "micro": Calculate metrics globally by counting the total TP, FN and FP.
   &nbsp;&nbsp;&nbsp;&nbsp; iii. None: Return f1 scores for each class (numpy.ndarray) .
   ### 7. add Class Fbeta(class_type="binary", beta=1, threshold=0.5, average="micro")
   ### 8. **UPD: using mxnet.numpy instead of numpy**


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] leezu closed issue #18046: Proposal to mxnet.metric

Posted by GitBox <gi...@apache.org>.

leezu closed issue #18046:
URL: https://github.com/apache/incubator-mxnet/issues/18046


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org