You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Debasish Das (JIRA)" <ji...@apache.org> on 2014/11/20 03:13:34 UTC

[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS

    [ https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218891#comment-14218891 ] 

Debasish Das edited comment on SPARK-2426 at 11/20/14 2:13 AM:
---------------------------------------------------------------

With the MAP measures being added to examples.MovieLensALS through https://issues.apache.org/jira/browse/SPARK-4231 I compared the quality and runtime of the matrix completion formulations on MovieLens 1M dataset:

Default: userConstraint L2, productConstraint L2 lambdaUser=lambdaProduct=0.065 rank=100 iterations 10

Test RMSE = 0.8436480113821955.
Test users 6038 MAP 0.05860164548002782

Solver: Cholesky decomposition followed by forward-backward solves

Per iteration runtime for baseline (solveTime in ms)

14/11/19 17:37:06 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 362.813 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 314.527 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 265.75 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 271.513 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 370.177 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 467.994 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 511.894 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 481.189 Iters 0

NMF: userConstraint POSITIVE, productConstraint POSITIVE, userLambda=productLambda=0.065 L2 regularization

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint POSITIVE productConstraint POSITIVE
Test RMSE = 0.8435335132641906.
Test users 6038 MAP 0.056361816590625446

ALS iteration1 runtime:

QuadraticMinimizer convergence profile:

14/11/19 17:46:46 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 1936.281 Iters 73132
14/11/19 17:46:46 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 1871.364 Iters 75219
14/11/19 17:46:46 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 2067.735 Iters 73180
14/11/19 17:46:46 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 2127.161 Iters 75546
14/11/19 17:46:53 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 3813.923 Iters 193207
14/11/19 17:46:54 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 3894.068 Iters 196882
14/11/19 17:46:54 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 3875.915 Iters 193987
14/11/19 17:46:54 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 3939.765 Iters 192471

NNLS convergence profile:

14/11/19 17:46:46 INFO ALS: NNLS solveTime 252.909 iters 7381
14/11/19 17:46:46 INFO ALS: NNLS solveTime 256.803 iters 7740
14/11/19 17:46:46 INFO ALS: NNLS solveTime 274.352 iters 7491
14/11/19 17:46:46 INFO ALS: NNLS solveTime 272.971 iters 7664
14/11/19 17:46:53 INFO ALS: NNLS solveTime 1487.262 iters 60338
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1472.742 iters 61321
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1489.863 iters 62228
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1494.192 iters 60489

ALS iteration 10

Quadratic Minimizer convergence profile:

14/11/19 17:48:17 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 1082.056 Iters 53724
14/11/19 17:48:17 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 1180.601 Iters 50593
14/11/19 17:48:17 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 1106.131 Iters 53069
14/11/19 17:48:17 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 1108.478 Iters 50895
14/11/19 17:48:23 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 2262.193 Iters 116818
14/11/19 17:48:23 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 2293.64 Iters 116026
14/11/19 17:48:23 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 2241.491 Iters 116293
14/11/19 17:48:23 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 2372.957 Iters 118391

NNLS convergence profile:

14/11/19 17:48:17 INFO ALS: NNLS solveTime 623.031 iters 21611
14/11/19 17:48:17 INFO ALS: NNLS solveTime 553.493 iters 21732
14/11/19 17:48:17 INFO ALS: NNLS solveTime 559.9 iters 22511
14/11/19 17:48:17 INFO ALS: NNLS solveTime 556.654 iters 21330
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1672.582 iters 86006
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1703.221 iters 85824
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1826.252 iters 85403
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1753.859 iters 86559

NNLS looks faster but the algorithms are not running same convergence criteria. I am following primal dual convergence similar to IP based solver which I feel can be optimized further. ABSTOL right now is at 1e-8 which gives MOSEK like quality for well conditioned gram matrices like what shows up in ALS.
 
Sparse coding: userConstraint SMOOTH, productConstraint SPARSE, userLambda=0.065 productLambda=0.065 ElasticNet regularization 

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint SMOOTH productConstraint SPARSE
Test RMSE = 0.886351402513496.
Test users 6038 MAP 0.03141036089472268

ALS iteration 1

QuadraticMinimizer convergence profile:

14/11/19 17:56:45 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 1641.588 Iters 63077
14/11/19 17:56:46 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 1702.631 Iters 61617
14/11/19 17:56:46 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 1802.781 Iters 66567
14/11/19 17:56:46 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 1911.827 Iters 65827
14/11/19 17:56:48 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 456.11 Iters 0
14/11/19 17:56:48 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 404.214 Iters 0
14/11/19 17:56:48 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 459.732 Iters 0
14/11/19 17:56:49 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 409.18 Iters 0

ALS iteration 10:

QuadraticMinimizer convergence profile

14/11/19 17:57:47 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 2135.688 Iters 125344
14/11/19 17:57:47 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 2240.827 Iters 125399
14/11/19 17:57:47 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 2202.971 Iters 126542
14/11/19 17:57:48 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 2220.846 Iters 129391
14/11/19 17:57:50 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 358.622 Iters 0
14/11/19 17:57:50 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 352.825 Iters 0
14/11/19 17:57:50 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 350.971 Iters 0
14/11/19 17:57:50 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 298.721 Iters 0

I will run Netflix dataset just to see if Sparse coding/POSITIVITY can improve MAP but on MovieLens1M looks like winner is the default formulation.

QuadraticMinimizer supports lot more features than what we showed for recommendation datasets. Next set of experiments will focus on LDA datasets from https://issues.apache.org/jira/browse/SPARK-1405 for LSA comparisons.


was (Author: debasish83):
With the MAP measures being added to examples.MovieLensALS through https://issues.apache.org/jira/browse/SPARK-4231 I compared the quality and runtime of the matrix completion formulations on MovieLens 1M dataset:

Default: userConstraint L2, productConstraint L2 lambdaUser=lambdaProduct=0.065 rank=100 iterations 10

Test RMSE = 0.8436480113821955.
Test users 6038 MAP 0.05860164548002782

Solver: Cholesky decomposition followed by forward-backward solves

Per iteration runtime for baseline (solveTime in ms)

14/11/19 17:37:06 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 362.813 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 314.527 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 265.75 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 271.513 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 370.177 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 467.994 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 511.894 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 481.189 Iters 0

NMF: userConstraint POSITIVE, productConstraint POSITIVE, userLambda=productLambda=0.065 L2 regularization

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint POSITIVE productConstraint POSITIVE
Test RMSE = 0.8435335132641906.
Test users 6038 MAP 0.056361816590625446

ALS iteration1 runtime:

QuadraticMinimizer convergence profile:

14/11/19 17:46:46 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 1936.281 Iters 73132
14/11/19 17:46:46 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 1871.364 Iters 75219
14/11/19 17:46:46 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 2067.735 Iters 73180
14/11/19 17:46:46 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 2127.161 Iters 75546
14/11/19 17:46:53 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 3813.923 Iters 193207
14/11/19 17:46:54 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 3894.068 Iters 196882
14/11/19 17:46:54 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 3875.915 Iters 193987
14/11/19 17:46:54 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 3939.765 Iters 192471

NNLS convergence profile:

14/11/19 17:46:46 INFO ALS: NNLS solveTime 252.909 iters 7381
14/11/19 17:46:46 INFO ALS: NNLS solveTime 256.803 iters 7740
14/11/19 17:46:46 INFO ALS: NNLS solveTime 274.352 iters 7491
14/11/19 17:46:46 INFO ALS: NNLS solveTime 272.971 iters 7664
14/11/19 17:46:53 INFO ALS: NNLS solveTime 1487.262 iters 60338
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1472.742 iters 61321
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1489.863 iters 62228
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1494.192 iters 60489

ALS iteration 10

Quadratic Minimizer convergence profile:

14/11/19 17:48:17 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 1082.056 Iters 53724
14/11/19 17:48:17 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 1180.601 Iters 50593
14/11/19 17:48:17 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 1106.131 Iters 53069
14/11/19 17:48:17 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 1108.478 Iters 50895
14/11/19 17:48:23 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 2262.193 Iters 116818
14/11/19 17:48:23 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 2293.64 Iters 116026
14/11/19 17:48:23 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 2241.491 Iters 116293
14/11/19 17:48:23 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 2372.957 Iters 118391

NNLS convergence profile:

14/11/19 17:48:17 INFO ALS: NNLS solveTime 623.031 iters 21611
14/11/19 17:48:17 INFO ALS: NNLS solveTime 553.493 iters 21732
14/11/19 17:48:17 INFO ALS: NNLS solveTime 559.9 iters 22511
14/11/19 17:48:17 INFO ALS: NNLS solveTime 556.654 iters 21330
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1672.582 iters 86006
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1703.221 iters 85824
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1826.252 iters 85403
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1753.859 iters 86559

NNLS looks faster but both the algorithms are running same convergence criteria. I am following primal dual convergence similar to IP based solver which I feel can be optimized further. ABSTOL right now is at 1e-8 which gives MOSEK like quality for well conditioned gram matrices like what shows up in ALS.
 
Sparse coding: userConstraint SMOOTH, productConstraint SPARSE, userLambda=0.065 productLambda=0.065 ElasticNet regularization 

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint SMOOTH productConstraint SPARSE
Test RMSE = 0.886351402513496.
Test users 6038 MAP 0.03141036089472268

ALS iteration 1

QuadraticMinimizer convergence profile:

14/11/19 17:56:45 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 1641.588 Iters 63077
14/11/19 17:56:46 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 1702.631 Iters 61617
14/11/19 17:56:46 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 1802.781 Iters 66567
14/11/19 17:56:46 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 1911.827 Iters 65827
14/11/19 17:56:48 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 456.11 Iters 0
14/11/19 17:56:48 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 404.214 Iters 0
14/11/19 17:56:48 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 459.732 Iters 0
14/11/19 17:56:49 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 409.18 Iters 0

ALS iteration 10:

QuadraticMinimizer convergence profile

14/11/19 17:57:47 INFO ALS: usersOrProducts 910 slowConvergence 0 QuadraticMinimizer solveTime 2135.688 Iters 125344
14/11/19 17:57:47 INFO ALS: usersOrProducts 918 slowConvergence 0 QuadraticMinimizer solveTime 2240.827 Iters 125399
14/11/19 17:57:47 INFO ALS: usersOrProducts 924 slowConvergence 0 QuadraticMinimizer solveTime 2202.971 Iters 126542
14/11/19 17:57:48 INFO ALS: usersOrProducts 927 slowConvergence 0 QuadraticMinimizer solveTime 2220.846 Iters 129391
14/11/19 17:57:50 INFO ALS: usersOrProducts 1510 slowConvergence 0 QuadraticMinimizer solveTime 358.622 Iters 0
14/11/19 17:57:50 INFO ALS: usersOrProducts 1511 slowConvergence 0 QuadraticMinimizer solveTime 352.825 Iters 0
14/11/19 17:57:50 INFO ALS: usersOrProducts 1507 slowConvergence 0 QuadraticMinimizer solveTime 350.971 Iters 0
14/11/19 17:57:50 INFO ALS: usersOrProducts 1512 slowConvergence 0 QuadraticMinimizer solveTime 298.721 Iters 0

I will run Netflix dataset just to see if Sparse coding/POSITIVITY can improve MAP but on MovieLens1M looks like winner is the default formulation.

QuadraticMinimizer supports lot more features than what we showed for recommendation datasets. Next set of experiments will focus on LDA datasets from https://issues.apache.org/jira/browse/SPARK-1405 for LSA comparisons.

> Quadratic Minimization for MLlib ALS
> ------------------------------------
>
>                 Key: SPARK-2426
>                 URL: https://issues.apache.org/jira/browse/SPARK-2426
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Debasish Das
>            Assignee: Debasish Das
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Current ALS supports least squares and nonnegative least squares.
> I presented ADMM and IPM based Quadratic Minimization solvers to be used for the following ALS problems:
> 1. ALS with bounds
> 2. ALS with L1 regularization
> 3. ALS with Equality constraint and bounds
> Initial runtime comparisons are presented at Spark Summit. 
> http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
> Based on Xiangrui's feedback I am currently comparing the ADMM based Quadratic Minimization solvers with IPM based QpSolvers and the default ALS/NNLS. I will keep updating the runtime comparison results.
> For integration the detailed plan is as follows:
> 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
> 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org