You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samoa.apache.org by "Maciej Grzenda (JIRA)" <ji...@apache.org> on 2017/07/03 09:25:00 UTC

[jira] [Commented] (SAMOA-68) Saving true and predicted labels to file

    [ https://issues.apache.org/jira/browse/SAMOA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072167#comment-16072167 ] 

Maciej Grzenda commented on SAMOA-68:
-------------------------------------

First of all, let me share test results aiming to find the extra time needed to dump predictions to file.
I made a couple of tests also to see the impact of saving predictions compared to other file already available (accuracies available through -d dumpFile). Here are the results (100 000 instances, airlines data):

The four columns below are: Frequency (prediction dump; accuracy dump made only once after all instances); Time [s] ;	Frequency (accuracy dump; prediction dump turned off); Time [s]
1	23	1	180
5	22	5	50
10	21	10	35
100	20	100	24
                
The less frequently we dump predictions, the lower processing time. However, the impact of saving predictions to file (documented in first two columns) looks to be acceptable for me (it is no more than 3 seconds in the test scenario). In particular, it seems to be much lower than the impact of saving accuracies (Kappa etc.) to file with previously existing code (documented in columns 3 and 4, with the extra cost up to 150+ seconds). Obviously the disk I/O has its part in it. Taking into the fact that saving predictions is supposed to be used with method development and for debugging purposes, the overhead of ca. 3/20 sec.=15% looks acceptable to me. However, this will be even further reduced thanks to simplifications planned.

> Saving true and predicted labels to file
> ----------------------------------------
>
>                 Key: SAMOA-68
>                 URL: https://issues.apache.org/jira/browse/SAMOA-68
>             Project: SAMOA
>          Issue Type: New Feature
>          Components: SAMOA-API
>            Reporter: Maciej Grzenda
>              Labels: features
>
> Currently PrequentialEvaluation task supports dumpFile option.  With this option model performance can be saved to a file. However, in some cases it would be good to save also individual predictions made by a model.  This is useful for model debugging and method development.
> This could be also used to visualize model output, calculate custom performance indicators (e.g. model accuracy for instances of a certain class or sharing the same feature value).  Such saving of model output (if done) should be made for every instance. Hence, a new option making it possible to dump predictions to a separate file seems justified.  For classification, it should include votes made for individual classes, if available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)