You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yuhao yang (JIRA)" <ji...@apache.org> on 2017/06/13 22:29:00 UTC

[jira] [Updated] (SPARK-20602) Adding LBFGS as optimizer for LinearSVC

     [ https://issues.apache.org/jira/browse/SPARK-20602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

yuhao yang updated SPARK-20602:
-------------------------------
    Description: 
Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between LBFGS and OWLQN on several public dataset and found LBFGS converges much faster for LinearSVC in most cases.

The following table presents the number of training iterations and f1 score of both optimizers until convergence

||Dataset||LBFGS with hinge||OWLQN with hinge||LBFGS with squared_hinge||
|news20.binary| 31 (0.99) | 413(0.99) |  185 (0.99) |
|mushroom| 28(1.0) | 170(1.0)| 24(1.0) |
|madelon|143(0.75) | 8129(0.70)| 823(0.74) |
|breast-cancer-scale| 15(1.0) | 16(1.0)| 15 (1.0) |
|phishing | 329(0.94) | 231(0.94) | 67 (0.94) |
|a1a(adult) | 466 (0.87) | 282 (0.87) | 77 (0.86) |
|a7a | 237 (0.84) | 372(0.84) | 69(0.84) |

data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
training code: new LinearSVC().setMaxIter(10000).setTol(1e-6)

LBFGS requires less iterations in most cases (except for a1a) and probably is a better default optimizer. 



  was:
Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between LBFGS and OWLQN on several public dataset and found LBFGS converges much faster for LinearSVC in most cases.

The following table presents the number of training iterations and f1 score of both optimizers until convergence

||Dataset||LBFGS||OWLQN||
|news20.binary| 31 (0.99) | 413(0.99) |
|mushroom| 28(1.0) | 170(1.0)|
|madelon|143(0.75) | 8129(0.70)|
|breast-cancer-scale| 15(1.0) | 16(1.0)|
|phishing | 329(0.94) | 231(0.94) |
|a1a(adult) | 466 (0.87) | 282 (0.87) |
|a7a | 237 (0.84) | 372(0.84) |

data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
training code: new LinearSVC().setMaxIter(10000).setTol(1e-6)

LBFGS requires less iterations in most cases (except for a1a) and probably is a better default optimizer. 




> Adding LBFGS as optimizer for LinearSVC
> ---------------------------------------
>
>                 Key: SPARK-20602
>                 URL: https://issues.apache.org/jira/browse/SPARK-20602
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: yuhao yang
>
> Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between LBFGS and OWLQN on several public dataset and found LBFGS converges much faster for LinearSVC in most cases.
> The following table presents the number of training iterations and f1 score of both optimizers until convergence
> ||Dataset||LBFGS with hinge||OWLQN with hinge||LBFGS with squared_hinge||
> |news20.binary| 31 (0.99) | 413(0.99) |  185 (0.99) |
> |mushroom| 28(1.0) | 170(1.0)| 24(1.0) |
> |madelon|143(0.75) | 8129(0.70)| 823(0.74) |
> |breast-cancer-scale| 15(1.0) | 16(1.0)| 15 (1.0) |
> |phishing | 329(0.94) | 231(0.94) | 67 (0.94) |
> |a1a(adult) | 466 (0.87) | 282 (0.87) | 77 (0.86) |
> |a7a | 237 (0.84) | 372(0.84) | 69(0.84) |
> data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
> training code: new LinearSVC().setMaxIter(10000).setTol(1e-6)
> LBFGS requires less iterations in most cases (except for a1a) and probably is a better default optimizer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org