You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Debasish Das <de...@gmail.com> on 2014/06/06 00:38:58 UTC

Constraint Solver for Spark

Hi,

We are adding a constrained ALS solver in Spark to solve matrix
factorization use-cases which needs additional constraints (bounds,
equality, inequality, quadratic constraints)

We are using a native version of a primal dual SOCP solver due to its small
memory footprint and sparse ccs matrix computation it uses...The solver
depends on AMD and LDL packages from Timothy Davis for sparse ccs matrix
algebra (released under lgpl)...

Due to GPL dependencies, it won't be possible to release the code as Apache
license for now...If we get good results on our use-cases, we will plan to
write a version in breeze/modify joptimizer for sparse ccs operations...

I derived ConstrainedALS from Spark mllib ALS and I am comparing the
performance with default ALS and non-negative ALS as baseline. Plan is to
release the code as GPL license for community review...I have kept the
package structure as org.apache.spark.mllib.recommendation

There are some private functions defined in ALS, which I would like to
reuse....Is it possible to take the private out from the following
functions:

1. makeLinkRDDs
2. makeInLinkBlock
3. makeOutLinkBlock
4. randomFactor
5. unblockFactors

I don't want to copy any code.... I can ask for a PR to make these
changes...

Thanks.
Deb

Re: Constraint Solver for Spark

Posted by Xiangrui Meng <me...@gmail.com>.
Hey Deb,

If your goal is to solve the subproblems in ALS, exploring sparsity
doesn't give you much benefit because the data is small and dense.
Porting either ECOS's or PDCO's implementation but using dense
representation should be sufficient. Feel free to open a JIRA and we
can move our discussion there.

Best,
Xiangrui

On Fri, Jul 4, 2014 at 10:19 AM, Debasish Das <de...@gmail.com> wrote:
> I looked further and realized that ECOS used a mex file while PDCO is using
> pure Matlab code. So the out-of-box runtime comparison is not fair.
>
> I am trying to generate PDCO C port. Like ECOS, PDCO also makes use of
> sparse support from Tim Davis.
>
> Thanks.
> Deb

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
I looked further and realized that ECOS used a mex file while PDCO is using
pure Matlab code. So the out-of-box runtime comparison is not fair.

I am trying to generate PDCO C port. Like ECOS, PDCO also makes use of
sparse support from Tim Davis.

Thanks.
Deb

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
Hi Xiangrui,

I did some out-of-box comparisons with ECOS and PDCO from SOL.

Both of them seems to be running at par but I will do more detailed
analysis.

I used pdco's testQP randomized problem generation. pdcotestQP(m, n) means
m constraints and n variables

For ECOS runtime reference here is the paper
http://web.stanford.edu/~boyd/papers/pdf/ecos_ecc.pdf

It runs at par with gurobi but slower than MOSEK. Note that MOSEK is also a
SOCP solver.

K>> pdcotestQP(50, 100)

ECOSQP: Converting QP to SOCP...

ECOSQP: Time for Cholesky: 0.00 seconds

Conversion completed. Calling ECOS...

ECOS - (c) A. Domahidi, Automatic Control Laboratory, ETH Zurich, 2012-2014.

OPTIMAL (within feastol=1.0e-05, reltol=1.0e-06, abstol=1.0e-06).

Runtime: 0.010340 seconds.

--------------------------------------------------------

   pdco.m                            Version of 23 Nov 2013

   Primal-dual barrier method to minimize a convex function

   subject to linear constraints Ax + r = b,  bl <= x <= bu



   Michael Saunders       SOL and ICME, Stanford University

   Contributors:     Byunggyoo Kim (SOL), Chris Maes (ICME)

                     Santiago Akle (ICME), Matt Zahr (ICME)

   --------------------------------------------------------

m        =       50     n        =      100      nnz(A)  =      483

Method   =       21     (1=chol  2=QR  3=LSMR  4=MINRES  21=SQD(LU)
22=SQD(MA57))

Elapsed time is 0.050226 seconds.

2. With a larger problem with 50 equality and 1000 variables:

K>> pdcotestQP(50, 1000)

ECOSQP: Converting QP to SOCP...

ECOSQP: Time for Cholesky: 0.05 seconds

Conversion completed. Calling ECOS...

ECOS - (c) A. Domahidi, Automatic Control Laboratory, ETH Zurich, 2012-2014.

OPTIMAL (within feastol=1.0e-05, reltol=1.0e-06, abstol=1.0e-06).

Runtime: 6.333036 seconds.


   --------------------------------------------------------

   pdco.m                            Version of 23 Nov 2013

   Primal-dual barrier method to minimize a convex function

   subject to linear constraints Ax + r = b,  bl <= x <= bu



   Michael Saunders       SOL and ICME, Stanford University

   Contributors:     Byunggyoo Kim (SOL), Chris Maes (ICME)

                     Santiago Akle (ICME), Matt Zahr (ICME)

   --------------------------------------------------------


The objective is defined by a function handle:

   @(x)deal(0.5*(x'*H*x)+c'*x,H*x+c,H)

The matrix A is an explicit sparse matrix

m        =       50     n        =     1000      nnz(A)  =     4842

Method   =       21     (1=chol  2=QR  3=LSMR  4=MINRES  21=SQD(LU)
22=SQD(MA57))

Elapsed time is 7.531934 seconds.

I will change the Method = 21 (LU) to 1 (chol) and that should help PDCO.
If both the IPMs are at par it's still a good idea to choose ECOS as the
generic IPM since it can solve conic programs which are a superset of
quadratic programs (robust portfolio optimization from quantitative finance
is an example of standard conic program).

For the runtime comparisons with ADMM based decomposition for simpler
constraints, I am doing further profiling and see if the jnilib is causing
any performance issues for ECOS calls...

Please look at the proximal algorithm references
http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf. For many problems
like L1 constraint / positivity / bounds / huber / hyperplane projection
etc, the proximal operator is simple to evaluate and for these cases ADMM
decomposition has been shown to run faster than standard constraint solvers
like IPM. I am not very surprised that Sparse NMF runs in 4X runtime of
least squares using ADMM decomposition.

Distributed consensus is another ADMM decomposition which we are working
with right now. We will have some results on that soon. There the idea is
to use ADMM so that accumulator need not collect dense gradient vectors
from each worker. This development will further help the treeReduce work.

Should I open up Spark JIRA's so that we can document Quadratic
Minimization related runtime experiments/benchmarks and share the code for
review ?

Most likely the core solvers will go to breeze and in Spark mllib
optimization, I will add a QpSolver object which will call the underlying
breeze solvers based on the problem complexity....the ecos jnilib can be
part of breeze natives as it is GPL licensed (same as netlib-java
jniloader). I will push the jnilib as a PR to ecos repository
https://github.com/ifa-ethz/ecos

Thanks.

Deb
On Wed, Jul 2, 2014 at 1:52 AM, Xiangrui Meng <me...@gmail.com> wrote:

> Hi Deb,
>
> KNITRO and MOSEK are both commercial. If you are looking for
> open-source ones, you can take a look at PDCO from SOL:
>
> http://web.stanford.edu/group/SOL/software/pdco/
>
> Each subproblem is really just a small QP. ADMM is designed for the
> cases when data is distributively stored or the objective function is
> complex but splittable. Neither applies to this case.
>
> Best,
> Xiangrui
>
> On Tue, Jul 1, 2014 at 11:05 PM, Debasish Das <de...@gmail.com>
> wrote:
> > Hi Xiangrui,
> >
> > Could you please point to the IPM solver that you have positive results
> > with ? I was planning to compare with CVX, KNITRO from Professor Nocedal
> > and MOSEK probably...I don't have CPLEX license so I won't be able to do
> > that comparison...
> >
> > My experiments so far tells me that ADMM based solver is faster than IPM
> > for simpler constraints but then perhaps I did not choose the correct
> > IPM....
> >
> > Proximal algorithm paper also shows very similar results compared to CVX:
> >
> > http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf
> >
> > Thanks.
> > Deb
> >
> > On Wed, Jun 11, 2014 at 3:21 PM, Xiangrui Meng <me...@gmail.com> wrote:
> >
> >> You idea is close to what implicit feedback does. You can check the
> >> paper, which is short and concise. In the ALS setting, all subproblems
> >> are independent in each iteration. This is part of the reason why ALS
> >> is scalable. If you have some global constraints that make the
> >> subproblems no longer decoupled, that would certainly affects
> >> scalability. -Xiangrui
> >>
> >> On Wed, Jun 11, 2014 at 2:20 AM, Debasish Das <debasish.das83@gmail.com
> >
> >> wrote:
> >> > I got it...ALS formulation is solving the matrix completion
> problem....
> >> >
> >> > To convert the problem to matrix factorization or take user feedback
> >> > (missing entries means the user hate the site ?), we should put 0 to
> the
> >> > missing entries (or may be -1)...in that case we have to use
> computeYtY
> >> and
> >> > accumulate over users in each block to generate full gram matrix...and
> >> > after that while computing userXy(index) we have to be careful in
> putting
> >> > 0/-1 for rest of the features...
> >> >
> >> > Is implicit feedback trying to do something like this ?
> >> >
> >> > Basically I am trying to see if it is possible to cache the gram
> matrix
> >> and
> >> > it's cholesky factorization, and then call the QpSolver multiple times
> >> with
> >> > updated gradient term...I am expecting better runtimes than dposv when
> >> > ranks are high...
> >> >
> >> > But seems like that's not possible without a broadcast step which
> might
> >> > kill all the runtime gain...
> >> >
> >> >
> >> > On Wed, Jun 11, 2014 at 12:21 AM, Xiangrui Meng <me...@gmail.com>
> >> wrote:
> >> >
> >> >> For explicit feedback, ALS uses only observed ratings for
> computation.
> >> >> So XtXs are not the same. -Xiangrui
> >> >>
> >> >> On Tue, Jun 10, 2014 at 8:58 PM, Debasish Das <
> debasish.das83@gmail.com
> >> >
> >> >> wrote:
> >> >> > Sorry last one went out by mistake:
> >> >> >
> >> >> > Is not for users (0 to numUsers), fullXtX is same ? In the ALS
> >> >> formulation
> >> >> > this is W^TW or H^TH which should be same for all the users ? Why
> we
> >> are
> >> >> > reading userXtX(index) and adding it to fullXtX in the loop over
> all
> >> >> > numUsers ?
> >> >> >
> >> >> > // Solve the least-squares problem for each user and return the new
> >> >> feature
> >> >> > vectors
> >> >> >
> >> >> >     Array.range(0, numUsers).map { index =>
> >> >> >
> >> >> >       // Compute the full XtX matrix from the lower-triangular
> part we
> >> >> got
> >> >> > above
> >> >> >
> >> >> >       fillFullMatrix(userXtX(index), fullXtX)
> >> >> >
> >> >> >       // Add regularization
> >> >> >
> >> >> >       var i = 0
> >> >> >
> >> >> >       while (i < rank) {
> >> >> >
> >> >> >         fullXtX.data(i * rank + i) += lambda
> >> >> >
> >> >> >         i += 1
> >> >> >
> >> >> >       }
> >> >> >
> >> >> >       // Solve the resulting matrix, which is symmetric and
> >> >> > positive-definite
> >> >> >
> >> >> >       algo match {
> >> >> >
> >> >> >         case ALSAlgo.Implicit =>
> >> >> > Solve.solvePositive(fullXtX.addi(YtY.get.value),
> >> >> > userXy(index)).data
> >> >> >
> >> >> >         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX,
> userXy
> >> >> > (index)).data
> >> >> >
> >> >> >       }
> >> >> >
> >> >> >     }
> >> >> >
> >> >> >
> >> >> > On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das <
> >> debasish.das83@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi,
> >> >> >>
> >> >> >> I am bit confused wiht the code here:
> >> >> >>
> >> >> >> // Solve the least-squares problem for each user and return the
> new
> >> >> >> feature vectors
> >> >> >>
> >> >> >>     Array.range(0, numUsers).map { index =>
> >> >> >>
> >> >> >>       // Compute the full XtX matrix from the lower-triangular
> part
> >> we
> >> >> >> got above
> >> >> >>
> >> >> >>       fillFullMatrix(userXtX(index), fullXtX)
> >> >> >>
> >> >> >>       // Add regularization
> >> >> >>
> >> >> >>       var i = 0
> >> >> >>
> >> >> >>       while (i < rank) {
> >> >> >>
> >> >> >>         fullXtX.data(i * rank + i) += lambda
> >> >> >>
> >> >> >>         i += 1
> >> >> >>
> >> >> >>       }
> >> >> >>
> >> >> >>       // Solve the resulting matrix, which is symmetric and
> >> >> >> positive-definite
> >> >> >>
> >> >> >>       algo match {
> >> >> >>
> >> >> >>         case ALSAlgo.Implicit =>
> >> >> Solve.solvePositive(fullXtX.addi(YtY.get.value),
> >> >> >> userXy(index)).data
> >> >> >>
> >> >> >>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX,
> userXy
> >> >> >> (index)).data
> >> >> >>
> >> >> >>       }
> >> >> >>
> >> >> >>     }
> >> >> >>
> >> >> >>
> >> >> >> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <
> >> debasish.das83@gmail.com
> >> >> >
> >> >> >> wrote:
> >> >> >>
> >> >> >>> Hi Xiangrui,
> >> >> >>>
> >> >> >>> It's not the linear constraint, It is quadratic inequality with
> >> slack,
> >> >> >>> first order taylor approximation of off diagonal cross terms and
> a
> >> >> cyclic
> >> >> >>> coordinate descent, which we think will yield
> orthogonality....It's
> >> >> still
> >> >> >>> under works...
> >> >> >>>
> >> >> >>> Also we want to put a L1 constraint as set of linear equations
> when
> >> >> >>> solving for ALS...
> >> >> >>>
> >> >> >>> I will create the JIRA...as I see it, this will evolve to a
> generic
> >> >> >>> constraint solver for machine learning problems that has a QP
> >> >> >>> structure....ALS is one example....another example is kernel
> SVMs...
> >> >> >>>
> >> >> >>> I did not know that lgpl solver can be added to the
> classpath....if
> >> it
> >> >> >>> can be then definitely we should add these in ALS.scala...
> >> >> >>>
> >> >> >>> Thanks.
> >> >> >>> Deb
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <mengxr@gmail.com
> >
> >> >> wrote:
> >> >> >>>
> >> >> >>>> I don't quite understand why putting linear constraints can
> promote
> >> >> >>>> orthogonality. For the interfaces, if the subproblem is
> determined
> >> by
> >> >> >>>> Y^T Y and Y^T b for each iteration, then the least squares
> solver,
> >> the
> >> >> >>>> non-negative least squares solver, or your convex solver is
> simply
> >> a
> >> >> >>>> function
> >> >> >>>>
> >> >> >>>> (A, b) -> x.
> >> >> >>>>
> >> >> >>>> You can define it as an interface, and make the solver
> pluggable by
> >> >> >>>> adding a setter to ALS. If you want to use your lgpl solver,
> just
> >> >> >>>> include it in the classpath. Creating two separate files still
> >> seems
> >> >> >>>> unnecessary to me. Could you create a JIRA and we can move our
> >> >> >>>> discussion there? Thanks!
> >> >> >>>>
> >> >> >>>> Best,
> >> >> >>>> Xiangrui
> >> >> >>>>
> >> >> >>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <
> >> >> debasish.das83@gmail.com>
> >> >> >>>> wrote:
> >> >> >>>> > Hi Xiangrui,
> >> >> >>>> >
> >> >> >>>> > For orthogonality properties in the factors we need a
> constraint
> >> >> solver
> >> >> >>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
> >> >> >>>> >
> >> >> >>>> > The interface of constraint solver is standard and I can add
> it
> >> in
> >> >> >>>> mllib
> >> >> >>>> > optimization....
> >> >> >>>> >
> >> >> >>>> > But I am not sure how will I call the gpl licensed ipm solver
> >> from
> >> >> >>>> > mllib....assume the solver interface is as follows:
> >> >> >>>> >
> >> >> >>>> > Qpsolver (densematrix h, array [double] f, int linearEquality,
> >> int
> >> >> >>>> > linearInequality, bool lb, bool ub)
> >> >> >>>> >
> >> >> >>>> > And then I have functions to update equalities, inequalities,
> >> bounds
> >> >> >>>> etc
> >> >> >>>> > followed by the run which generates the solution....
> >> >> >>>> >
> >> >> >>>> > For l1 constraints I have to use epigraph formulation which
> >> needs a
> >> >> >>>> > variable transformation before the solve....
> >> >> >>>> >
> >> >> >>>> > I was thinking that for the problems that does not need
> >> constraints
> >> >> >>>> people
> >> >> >>>> > will use ALS.scala and ConstrainedALS.scala will have the
> >> >> constrained
> >> >> >>>> > formulations....
> >> >> >>>> >
> >> >> >>>> > I can point you to the code once it is ready and then you can
> >> guide
> >> >> me
> >> >> >>>> how
> >> >> >>>> > to refactor it to mllib als ?
> >> >> >>>> >
> >> >> >>>> > Thanks.
> >> >> >>>> > Deb
> >> >> >>>> > Hi Deb,
> >> >> >>>> >
> >> >> >>>> > Why do you want to make those methods public? If you only
> need to
> >> >> >>>> > replace the solver for subproblems. You can try to make the
> >> solver
> >> >> >>>> > pluggable. Now it supports least squares and non-negative
> least
> >> >> >>>> > squares. You can define an interface for the subproblem
> solvers
> >> and
> >> >> >>>> > maintain the IPM solver at your own code base, if the only
> >> >> information
> >> >> >>>> > you need is Y^T Y and Y^T b.
> >> >> >>>> >
> >> >> >>>> > Btw, just curious, what is the use case for quadratic
> >> constraints?
> >> >> >>>> >
> >> >> >>>> > Best,
> >> >> >>>> > Xiangrui
> >> >> >>>> >
> >> >> >>>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <
> >> >> debasish.das83@gmail.com
> >> >> >>>> >
> >> >> >>>> > wrote:
> >> >> >>>> >> Hi,
> >> >> >>>> >>
> >> >> >>>> >> We are adding a constrained ALS solver in Spark to solve
> matrix
> >> >> >>>> >> factorization use-cases which needs additional constraints
> >> (bounds,
> >> >> >>>> >> equality, inequality, quadratic constraints)
> >> >> >>>> >>
> >> >> >>>> >> We are using a native version of a primal dual SOCP solver
> due
> >> to
> >> >> its
> >> >> >>>> > small
> >> >> >>>> >> memory footprint and sparse ccs matrix computation it
> uses...The
> >> >> >>>> solver
> >> >> >>>> >> depends on AMD and LDL packages from Timothy Davis for sparse
> >> ccs
> >> >> >>>> matrix
> >> >> >>>> >> algebra (released under lgpl)...
> >> >> >>>> >>
> >> >> >>>> >> Due to GPL dependencies, it won't be possible to release the
> >> code
> >> >> as
> >> >> >>>> > Apache
> >> >> >>>> >> license for now...If we get good results on our use-cases, we
> >> will
> >> >> >>>> plan to
> >> >> >>>> >> write a version in breeze/modify joptimizer for sparse ccs
> >> >> >>>> operations...
> >> >> >>>> >>
> >> >> >>>> >> I derived ConstrainedALS from Spark mllib ALS and I am
> comparing
> >> >> the
> >> >> >>>> >> performance with default ALS and non-negative ALS as
> baseline.
> >> Plan
> >> >> >>>> is to
> >> >> >>>> >> release the code as GPL license for community review...I have
> >> kept
> >> >> the
> >> >> >>>> >> package structure as org.apache.spark.mllib.recommendation
> >> >> >>>> >>
> >> >> >>>> >> There are some private functions defined in ALS, which I
> would
> >> >> like to
> >> >> >>>> >> reuse....Is it possible to take the private out from the
> >> following
> >> >> >>>> >> functions:
> >> >> >>>> >>
> >> >> >>>> >> 1. makeLinkRDDs
> >> >> >>>> >> 2. makeInLinkBlock
> >> >> >>>> >> 3. makeOutLinkBlock
> >> >> >>>> >> 4. randomFactor
> >> >> >>>> >> 5. unblockFactors
> >> >> >>>> >>
> >> >> >>>> >> I don't want to copy any code.... I can ask for a PR to make
> >> these
> >> >> >>>> >> changes...
> >> >> >>>> >>
> >> >> >>>> >> Thanks.
> >> >> >>>> >> Deb
> >> >> >>>>
> >> >> >>>
> >> >> >>>
> >> >> >>
> >> >>
> >>
>

Re: Constraint Solver for Spark

Posted by Xiangrui Meng <me...@gmail.com>.
Hi Deb,

KNITRO and MOSEK are both commercial. If you are looking for
open-source ones, you can take a look at PDCO from SOL:

http://web.stanford.edu/group/SOL/software/pdco/

Each subproblem is really just a small QP. ADMM is designed for the
cases when data is distributively stored or the objective function is
complex but splittable. Neither applies to this case.

Best,
Xiangrui

On Tue, Jul 1, 2014 at 11:05 PM, Debasish Das <de...@gmail.com> wrote:
> Hi Xiangrui,
>
> Could you please point to the IPM solver that you have positive results
> with ? I was planning to compare with CVX, KNITRO from Professor Nocedal
> and MOSEK probably...I don't have CPLEX license so I won't be able to do
> that comparison...
>
> My experiments so far tells me that ADMM based solver is faster than IPM
> for simpler constraints but then perhaps I did not choose the correct
> IPM....
>
> Proximal algorithm paper also shows very similar results compared to CVX:
>
> http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf
>
> Thanks.
> Deb
>
> On Wed, Jun 11, 2014 at 3:21 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> You idea is close to what implicit feedback does. You can check the
>> paper, which is short and concise. In the ALS setting, all subproblems
>> are independent in each iteration. This is part of the reason why ALS
>> is scalable. If you have some global constraints that make the
>> subproblems no longer decoupled, that would certainly affects
>> scalability. -Xiangrui
>>
>> On Wed, Jun 11, 2014 at 2:20 AM, Debasish Das <de...@gmail.com>
>> wrote:
>> > I got it...ALS formulation is solving the matrix completion problem....
>> >
>> > To convert the problem to matrix factorization or take user feedback
>> > (missing entries means the user hate the site ?), we should put 0 to the
>> > missing entries (or may be -1)...in that case we have to use computeYtY
>> and
>> > accumulate over users in each block to generate full gram matrix...and
>> > after that while computing userXy(index) we have to be careful in putting
>> > 0/-1 for rest of the features...
>> >
>> > Is implicit feedback trying to do something like this ?
>> >
>> > Basically I am trying to see if it is possible to cache the gram matrix
>> and
>> > it's cholesky factorization, and then call the QpSolver multiple times
>> with
>> > updated gradient term...I am expecting better runtimes than dposv when
>> > ranks are high...
>> >
>> > But seems like that's not possible without a broadcast step which might
>> > kill all the runtime gain...
>> >
>> >
>> > On Wed, Jun 11, 2014 at 12:21 AM, Xiangrui Meng <me...@gmail.com>
>> wrote:
>> >
>> >> For explicit feedback, ALS uses only observed ratings for computation.
>> >> So XtXs are not the same. -Xiangrui
>> >>
>> >> On Tue, Jun 10, 2014 at 8:58 PM, Debasish Das <debasish.das83@gmail.com
>> >
>> >> wrote:
>> >> > Sorry last one went out by mistake:
>> >> >
>> >> > Is not for users (0 to numUsers), fullXtX is same ? In the ALS
>> >> formulation
>> >> > this is W^TW or H^TH which should be same for all the users ? Why we
>> are
>> >> > reading userXtX(index) and adding it to fullXtX in the loop over all
>> >> > numUsers ?
>> >> >
>> >> > // Solve the least-squares problem for each user and return the new
>> >> feature
>> >> > vectors
>> >> >
>> >> >     Array.range(0, numUsers).map { index =>
>> >> >
>> >> >       // Compute the full XtX matrix from the lower-triangular part we
>> >> got
>> >> > above
>> >> >
>> >> >       fillFullMatrix(userXtX(index), fullXtX)
>> >> >
>> >> >       // Add regularization
>> >> >
>> >> >       var i = 0
>> >> >
>> >> >       while (i < rank) {
>> >> >
>> >> >         fullXtX.data(i * rank + i) += lambda
>> >> >
>> >> >         i += 1
>> >> >
>> >> >       }
>> >> >
>> >> >       // Solve the resulting matrix, which is symmetric and
>> >> > positive-definite
>> >> >
>> >> >       algo match {
>> >> >
>> >> >         case ALSAlgo.Implicit =>
>> >> > Solve.solvePositive(fullXtX.addi(YtY.get.value),
>> >> > userXy(index)).data
>> >> >
>> >> >         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
>> >> > (index)).data
>> >> >
>> >> >       }
>> >> >
>> >> >     }
>> >> >
>> >> >
>> >> > On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das <
>> debasish.das83@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> I am bit confused wiht the code here:
>> >> >>
>> >> >> // Solve the least-squares problem for each user and return the new
>> >> >> feature vectors
>> >> >>
>> >> >>     Array.range(0, numUsers).map { index =>
>> >> >>
>> >> >>       // Compute the full XtX matrix from the lower-triangular part
>> we
>> >> >> got above
>> >> >>
>> >> >>       fillFullMatrix(userXtX(index), fullXtX)
>> >> >>
>> >> >>       // Add regularization
>> >> >>
>> >> >>       var i = 0
>> >> >>
>> >> >>       while (i < rank) {
>> >> >>
>> >> >>         fullXtX.data(i * rank + i) += lambda
>> >> >>
>> >> >>         i += 1
>> >> >>
>> >> >>       }
>> >> >>
>> >> >>       // Solve the resulting matrix, which is symmetric and
>> >> >> positive-definite
>> >> >>
>> >> >>       algo match {
>> >> >>
>> >> >>         case ALSAlgo.Implicit =>
>> >> Solve.solvePositive(fullXtX.addi(YtY.get.value),
>> >> >> userXy(index)).data
>> >> >>
>> >> >>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
>> >> >> (index)).data
>> >> >>
>> >> >>       }
>> >> >>
>> >> >>     }
>> >> >>
>> >> >>
>> >> >> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <
>> debasish.das83@gmail.com
>> >> >
>> >> >> wrote:
>> >> >>
>> >> >>> Hi Xiangrui,
>> >> >>>
>> >> >>> It's not the linear constraint, It is quadratic inequality with
>> slack,
>> >> >>> first order taylor approximation of off diagonal cross terms and a
>> >> cyclic
>> >> >>> coordinate descent, which we think will yield orthogonality....It's
>> >> still
>> >> >>> under works...
>> >> >>>
>> >> >>> Also we want to put a L1 constraint as set of linear equations when
>> >> >>> solving for ALS...
>> >> >>>
>> >> >>> I will create the JIRA...as I see it, this will evolve to a generic
>> >> >>> constraint solver for machine learning problems that has a QP
>> >> >>> structure....ALS is one example....another example is kernel SVMs...
>> >> >>>
>> >> >>> I did not know that lgpl solver can be added to the classpath....if
>> it
>> >> >>> can be then definitely we should add these in ALS.scala...
>> >> >>>
>> >> >>> Thanks.
>> >> >>> Deb
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com>
>> >> wrote:
>> >> >>>
>> >> >>>> I don't quite understand why putting linear constraints can promote
>> >> >>>> orthogonality. For the interfaces, if the subproblem is determined
>> by
>> >> >>>> Y^T Y and Y^T b for each iteration, then the least squares solver,
>> the
>> >> >>>> non-negative least squares solver, or your convex solver is simply
>> a
>> >> >>>> function
>> >> >>>>
>> >> >>>> (A, b) -> x.
>> >> >>>>
>> >> >>>> You can define it as an interface, and make the solver pluggable by
>> >> >>>> adding a setter to ALS. If you want to use your lgpl solver, just
>> >> >>>> include it in the classpath. Creating two separate files still
>> seems
>> >> >>>> unnecessary to me. Could you create a JIRA and we can move our
>> >> >>>> discussion there? Thanks!
>> >> >>>>
>> >> >>>> Best,
>> >> >>>> Xiangrui
>> >> >>>>
>> >> >>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <
>> >> debasish.das83@gmail.com>
>> >> >>>> wrote:
>> >> >>>> > Hi Xiangrui,
>> >> >>>> >
>> >> >>>> > For orthogonality properties in the factors we need a constraint
>> >> solver
>> >> >>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
>> >> >>>> >
>> >> >>>> > The interface of constraint solver is standard and I can add it
>> in
>> >> >>>> mllib
>> >> >>>> > optimization....
>> >> >>>> >
>> >> >>>> > But I am not sure how will I call the gpl licensed ipm solver
>> from
>> >> >>>> > mllib....assume the solver interface is as follows:
>> >> >>>> >
>> >> >>>> > Qpsolver (densematrix h, array [double] f, int linearEquality,
>> int
>> >> >>>> > linearInequality, bool lb, bool ub)
>> >> >>>> >
>> >> >>>> > And then I have functions to update equalities, inequalities,
>> bounds
>> >> >>>> etc
>> >> >>>> > followed by the run which generates the solution....
>> >> >>>> >
>> >> >>>> > For l1 constraints I have to use epigraph formulation which
>> needs a
>> >> >>>> > variable transformation before the solve....
>> >> >>>> >
>> >> >>>> > I was thinking that for the problems that does not need
>> constraints
>> >> >>>> people
>> >> >>>> > will use ALS.scala and ConstrainedALS.scala will have the
>> >> constrained
>> >> >>>> > formulations....
>> >> >>>> >
>> >> >>>> > I can point you to the code once it is ready and then you can
>> guide
>> >> me
>> >> >>>> how
>> >> >>>> > to refactor it to mllib als ?
>> >> >>>> >
>> >> >>>> > Thanks.
>> >> >>>> > Deb
>> >> >>>> > Hi Deb,
>> >> >>>> >
>> >> >>>> > Why do you want to make those methods public? If you only need to
>> >> >>>> > replace the solver for subproblems. You can try to make the
>> solver
>> >> >>>> > pluggable. Now it supports least squares and non-negative least
>> >> >>>> > squares. You can define an interface for the subproblem solvers
>> and
>> >> >>>> > maintain the IPM solver at your own code base, if the only
>> >> information
>> >> >>>> > you need is Y^T Y and Y^T b.
>> >> >>>> >
>> >> >>>> > Btw, just curious, what is the use case for quadratic
>> constraints?
>> >> >>>> >
>> >> >>>> > Best,
>> >> >>>> > Xiangrui
>> >> >>>> >
>> >> >>>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <
>> >> debasish.das83@gmail.com
>> >> >>>> >
>> >> >>>> > wrote:
>> >> >>>> >> Hi,
>> >> >>>> >>
>> >> >>>> >> We are adding a constrained ALS solver in Spark to solve matrix
>> >> >>>> >> factorization use-cases which needs additional constraints
>> (bounds,
>> >> >>>> >> equality, inequality, quadratic constraints)
>> >> >>>> >>
>> >> >>>> >> We are using a native version of a primal dual SOCP solver due
>> to
>> >> its
>> >> >>>> > small
>> >> >>>> >> memory footprint and sparse ccs matrix computation it uses...The
>> >> >>>> solver
>> >> >>>> >> depends on AMD and LDL packages from Timothy Davis for sparse
>> ccs
>> >> >>>> matrix
>> >> >>>> >> algebra (released under lgpl)...
>> >> >>>> >>
>> >> >>>> >> Due to GPL dependencies, it won't be possible to release the
>> code
>> >> as
>> >> >>>> > Apache
>> >> >>>> >> license for now...If we get good results on our use-cases, we
>> will
>> >> >>>> plan to
>> >> >>>> >> write a version in breeze/modify joptimizer for sparse ccs
>> >> >>>> operations...
>> >> >>>> >>
>> >> >>>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing
>> >> the
>> >> >>>> >> performance with default ALS and non-negative ALS as baseline.
>> Plan
>> >> >>>> is to
>> >> >>>> >> release the code as GPL license for community review...I have
>> kept
>> >> the
>> >> >>>> >> package structure as org.apache.spark.mllib.recommendation
>> >> >>>> >>
>> >> >>>> >> There are some private functions defined in ALS, which I would
>> >> like to
>> >> >>>> >> reuse....Is it possible to take the private out from the
>> following
>> >> >>>> >> functions:
>> >> >>>> >>
>> >> >>>> >> 1. makeLinkRDDs
>> >> >>>> >> 2. makeInLinkBlock
>> >> >>>> >> 3. makeOutLinkBlock
>> >> >>>> >> 4. randomFactor
>> >> >>>> >> 5. unblockFactors
>> >> >>>> >>
>> >> >>>> >> I don't want to copy any code.... I can ask for a PR to make
>> these
>> >> >>>> >> changes...
>> >> >>>> >>
>> >> >>>> >> Thanks.
>> >> >>>> >> Deb
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>
>> >>
>>

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
Hi Xiangrui,

Could you please point to the IPM solver that you have positive results
with ? I was planning to compare with CVX, KNITRO from Professor Nocedal
and MOSEK probably...I don't have CPLEX license so I won't be able to do
that comparison...

My experiments so far tells me that ADMM based solver is faster than IPM
for simpler constraints but then perhaps I did not choose the correct
IPM....

Proximal algorithm paper also shows very similar results compared to CVX:

http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf

Thanks.
Deb

On Wed, Jun 11, 2014 at 3:21 PM, Xiangrui Meng <me...@gmail.com> wrote:

> You idea is close to what implicit feedback does. You can check the
> paper, which is short and concise. In the ALS setting, all subproblems
> are independent in each iteration. This is part of the reason why ALS
> is scalable. If you have some global constraints that make the
> subproblems no longer decoupled, that would certainly affects
> scalability. -Xiangrui
>
> On Wed, Jun 11, 2014 at 2:20 AM, Debasish Das <de...@gmail.com>
> wrote:
> > I got it...ALS formulation is solving the matrix completion problem....
> >
> > To convert the problem to matrix factorization or take user feedback
> > (missing entries means the user hate the site ?), we should put 0 to the
> > missing entries (or may be -1)...in that case we have to use computeYtY
> and
> > accumulate over users in each block to generate full gram matrix...and
> > after that while computing userXy(index) we have to be careful in putting
> > 0/-1 for rest of the features...
> >
> > Is implicit feedback trying to do something like this ?
> >
> > Basically I am trying to see if it is possible to cache the gram matrix
> and
> > it's cholesky factorization, and then call the QpSolver multiple times
> with
> > updated gradient term...I am expecting better runtimes than dposv when
> > ranks are high...
> >
> > But seems like that's not possible without a broadcast step which might
> > kill all the runtime gain...
> >
> >
> > On Wed, Jun 11, 2014 at 12:21 AM, Xiangrui Meng <me...@gmail.com>
> wrote:
> >
> >> For explicit feedback, ALS uses only observed ratings for computation.
> >> So XtXs are not the same. -Xiangrui
> >>
> >> On Tue, Jun 10, 2014 at 8:58 PM, Debasish Das <debasish.das83@gmail.com
> >
> >> wrote:
> >> > Sorry last one went out by mistake:
> >> >
> >> > Is not for users (0 to numUsers), fullXtX is same ? In the ALS
> >> formulation
> >> > this is W^TW or H^TH which should be same for all the users ? Why we
> are
> >> > reading userXtX(index) and adding it to fullXtX in the loop over all
> >> > numUsers ?
> >> >
> >> > // Solve the least-squares problem for each user and return the new
> >> feature
> >> > vectors
> >> >
> >> >     Array.range(0, numUsers).map { index =>
> >> >
> >> >       // Compute the full XtX matrix from the lower-triangular part we
> >> got
> >> > above
> >> >
> >> >       fillFullMatrix(userXtX(index), fullXtX)
> >> >
> >> >       // Add regularization
> >> >
> >> >       var i = 0
> >> >
> >> >       while (i < rank) {
> >> >
> >> >         fullXtX.data(i * rank + i) += lambda
> >> >
> >> >         i += 1
> >> >
> >> >       }
> >> >
> >> >       // Solve the resulting matrix, which is symmetric and
> >> > positive-definite
> >> >
> >> >       algo match {
> >> >
> >> >         case ALSAlgo.Implicit =>
> >> > Solve.solvePositive(fullXtX.addi(YtY.get.value),
> >> > userXy(index)).data
> >> >
> >> >         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
> >> > (index)).data
> >> >
> >> >       }
> >> >
> >> >     }
> >> >
> >> >
> >> > On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das <
> debasish.das83@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I am bit confused wiht the code here:
> >> >>
> >> >> // Solve the least-squares problem for each user and return the new
> >> >> feature vectors
> >> >>
> >> >>     Array.range(0, numUsers).map { index =>
> >> >>
> >> >>       // Compute the full XtX matrix from the lower-triangular part
> we
> >> >> got above
> >> >>
> >> >>       fillFullMatrix(userXtX(index), fullXtX)
> >> >>
> >> >>       // Add regularization
> >> >>
> >> >>       var i = 0
> >> >>
> >> >>       while (i < rank) {
> >> >>
> >> >>         fullXtX.data(i * rank + i) += lambda
> >> >>
> >> >>         i += 1
> >> >>
> >> >>       }
> >> >>
> >> >>       // Solve the resulting matrix, which is symmetric and
> >> >> positive-definite
> >> >>
> >> >>       algo match {
> >> >>
> >> >>         case ALSAlgo.Implicit =>
> >> Solve.solvePositive(fullXtX.addi(YtY.get.value),
> >> >> userXy(index)).data
> >> >>
> >> >>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
> >> >> (index)).data
> >> >>
> >> >>       }
> >> >>
> >> >>     }
> >> >>
> >> >>
> >> >> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <
> debasish.das83@gmail.com
> >> >
> >> >> wrote:
> >> >>
> >> >>> Hi Xiangrui,
> >> >>>
> >> >>> It's not the linear constraint, It is quadratic inequality with
> slack,
> >> >>> first order taylor approximation of off diagonal cross terms and a
> >> cyclic
> >> >>> coordinate descent, which we think will yield orthogonality....It's
> >> still
> >> >>> under works...
> >> >>>
> >> >>> Also we want to put a L1 constraint as set of linear equations when
> >> >>> solving for ALS...
> >> >>>
> >> >>> I will create the JIRA...as I see it, this will evolve to a generic
> >> >>> constraint solver for machine learning problems that has a QP
> >> >>> structure....ALS is one example....another example is kernel SVMs...
> >> >>>
> >> >>> I did not know that lgpl solver can be added to the classpath....if
> it
> >> >>> can be then definitely we should add these in ALS.scala...
> >> >>>
> >> >>> Thanks.
> >> >>> Deb
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com>
> >> wrote:
> >> >>>
> >> >>>> I don't quite understand why putting linear constraints can promote
> >> >>>> orthogonality. For the interfaces, if the subproblem is determined
> by
> >> >>>> Y^T Y and Y^T b for each iteration, then the least squares solver,
> the
> >> >>>> non-negative least squares solver, or your convex solver is simply
> a
> >> >>>> function
> >> >>>>
> >> >>>> (A, b) -> x.
> >> >>>>
> >> >>>> You can define it as an interface, and make the solver pluggable by
> >> >>>> adding a setter to ALS. If you want to use your lgpl solver, just
> >> >>>> include it in the classpath. Creating two separate files still
> seems
> >> >>>> unnecessary to me. Could you create a JIRA and we can move our
> >> >>>> discussion there? Thanks!
> >> >>>>
> >> >>>> Best,
> >> >>>> Xiangrui
> >> >>>>
> >> >>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <
> >> debasish.das83@gmail.com>
> >> >>>> wrote:
> >> >>>> > Hi Xiangrui,
> >> >>>> >
> >> >>>> > For orthogonality properties in the factors we need a constraint
> >> solver
> >> >>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
> >> >>>> >
> >> >>>> > The interface of constraint solver is standard and I can add it
> in
> >> >>>> mllib
> >> >>>> > optimization....
> >> >>>> >
> >> >>>> > But I am not sure how will I call the gpl licensed ipm solver
> from
> >> >>>> > mllib....assume the solver interface is as follows:
> >> >>>> >
> >> >>>> > Qpsolver (densematrix h, array [double] f, int linearEquality,
> int
> >> >>>> > linearInequality, bool lb, bool ub)
> >> >>>> >
> >> >>>> > And then I have functions to update equalities, inequalities,
> bounds
> >> >>>> etc
> >> >>>> > followed by the run which generates the solution....
> >> >>>> >
> >> >>>> > For l1 constraints I have to use epigraph formulation which
> needs a
> >> >>>> > variable transformation before the solve....
> >> >>>> >
> >> >>>> > I was thinking that for the problems that does not need
> constraints
> >> >>>> people
> >> >>>> > will use ALS.scala and ConstrainedALS.scala will have the
> >> constrained
> >> >>>> > formulations....
> >> >>>> >
> >> >>>> > I can point you to the code once it is ready and then you can
> guide
> >> me
> >> >>>> how
> >> >>>> > to refactor it to mllib als ?
> >> >>>> >
> >> >>>> > Thanks.
> >> >>>> > Deb
> >> >>>> > Hi Deb,
> >> >>>> >
> >> >>>> > Why do you want to make those methods public? If you only need to
> >> >>>> > replace the solver for subproblems. You can try to make the
> solver
> >> >>>> > pluggable. Now it supports least squares and non-negative least
> >> >>>> > squares. You can define an interface for the subproblem solvers
> and
> >> >>>> > maintain the IPM solver at your own code base, if the only
> >> information
> >> >>>> > you need is Y^T Y and Y^T b.
> >> >>>> >
> >> >>>> > Btw, just curious, what is the use case for quadratic
> constraints?
> >> >>>> >
> >> >>>> > Best,
> >> >>>> > Xiangrui
> >> >>>> >
> >> >>>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <
> >> debasish.das83@gmail.com
> >> >>>> >
> >> >>>> > wrote:
> >> >>>> >> Hi,
> >> >>>> >>
> >> >>>> >> We are adding a constrained ALS solver in Spark to solve matrix
> >> >>>> >> factorization use-cases which needs additional constraints
> (bounds,
> >> >>>> >> equality, inequality, quadratic constraints)
> >> >>>> >>
> >> >>>> >> We are using a native version of a primal dual SOCP solver due
> to
> >> its
> >> >>>> > small
> >> >>>> >> memory footprint and sparse ccs matrix computation it uses...The
> >> >>>> solver
> >> >>>> >> depends on AMD and LDL packages from Timothy Davis for sparse
> ccs
> >> >>>> matrix
> >> >>>> >> algebra (released under lgpl)...
> >> >>>> >>
> >> >>>> >> Due to GPL dependencies, it won't be possible to release the
> code
> >> as
> >> >>>> > Apache
> >> >>>> >> license for now...If we get good results on our use-cases, we
> will
> >> >>>> plan to
> >> >>>> >> write a version in breeze/modify joptimizer for sparse ccs
> >> >>>> operations...
> >> >>>> >>
> >> >>>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing
> >> the
> >> >>>> >> performance with default ALS and non-negative ALS as baseline.
> Plan
> >> >>>> is to
> >> >>>> >> release the code as GPL license for community review...I have
> kept
> >> the
> >> >>>> >> package structure as org.apache.spark.mllib.recommendation
> >> >>>> >>
> >> >>>> >> There are some private functions defined in ALS, which I would
> >> like to
> >> >>>> >> reuse....Is it possible to take the private out from the
> following
> >> >>>> >> functions:
> >> >>>> >>
> >> >>>> >> 1. makeLinkRDDs
> >> >>>> >> 2. makeInLinkBlock
> >> >>>> >> 3. makeOutLinkBlock
> >> >>>> >> 4. randomFactor
> >> >>>> >> 5. unblockFactors
> >> >>>> >>
> >> >>>> >> I don't want to copy any code.... I can ask for a PR to make
> these
> >> >>>> >> changes...
> >> >>>> >>
> >> >>>> >> Thanks.
> >> >>>> >> Deb
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >>
>

Re: Constraint Solver for Spark

Posted by Xiangrui Meng <me...@gmail.com>.
You idea is close to what implicit feedback does. You can check the
paper, which is short and concise. In the ALS setting, all subproblems
are independent in each iteration. This is part of the reason why ALS
is scalable. If you have some global constraints that make the
subproblems no longer decoupled, that would certainly affects
scalability. -Xiangrui

On Wed, Jun 11, 2014 at 2:20 AM, Debasish Das <de...@gmail.com> wrote:
> I got it...ALS formulation is solving the matrix completion problem....
>
> To convert the problem to matrix factorization or take user feedback
> (missing entries means the user hate the site ?), we should put 0 to the
> missing entries (or may be -1)...in that case we have to use computeYtY and
> accumulate over users in each block to generate full gram matrix...and
> after that while computing userXy(index) we have to be careful in putting
> 0/-1 for rest of the features...
>
> Is implicit feedback trying to do something like this ?
>
> Basically I am trying to see if it is possible to cache the gram matrix and
> it's cholesky factorization, and then call the QpSolver multiple times with
> updated gradient term...I am expecting better runtimes than dposv when
> ranks are high...
>
> But seems like that's not possible without a broadcast step which might
> kill all the runtime gain...
>
>
> On Wed, Jun 11, 2014 at 12:21 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> For explicit feedback, ALS uses only observed ratings for computation.
>> So XtXs are not the same. -Xiangrui
>>
>> On Tue, Jun 10, 2014 at 8:58 PM, Debasish Das <de...@gmail.com>
>> wrote:
>> > Sorry last one went out by mistake:
>> >
>> > Is not for users (0 to numUsers), fullXtX is same ? In the ALS
>> formulation
>> > this is W^TW or H^TH which should be same for all the users ? Why we are
>> > reading userXtX(index) and adding it to fullXtX in the loop over all
>> > numUsers ?
>> >
>> > // Solve the least-squares problem for each user and return the new
>> feature
>> > vectors
>> >
>> >     Array.range(0, numUsers).map { index =>
>> >
>> >       // Compute the full XtX matrix from the lower-triangular part we
>> got
>> > above
>> >
>> >       fillFullMatrix(userXtX(index), fullXtX)
>> >
>> >       // Add regularization
>> >
>> >       var i = 0
>> >
>> >       while (i < rank) {
>> >
>> >         fullXtX.data(i * rank + i) += lambda
>> >
>> >         i += 1
>> >
>> >       }
>> >
>> >       // Solve the resulting matrix, which is symmetric and
>> > positive-definite
>> >
>> >       algo match {
>> >
>> >         case ALSAlgo.Implicit =>
>> > Solve.solvePositive(fullXtX.addi(YtY.get.value),
>> > userXy(index)).data
>> >
>> >         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
>> > (index)).data
>> >
>> >       }
>> >
>> >     }
>> >
>> >
>> > On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das <de...@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I am bit confused wiht the code here:
>> >>
>> >> // Solve the least-squares problem for each user and return the new
>> >> feature vectors
>> >>
>> >>     Array.range(0, numUsers).map { index =>
>> >>
>> >>       // Compute the full XtX matrix from the lower-triangular part we
>> >> got above
>> >>
>> >>       fillFullMatrix(userXtX(index), fullXtX)
>> >>
>> >>       // Add regularization
>> >>
>> >>       var i = 0
>> >>
>> >>       while (i < rank) {
>> >>
>> >>         fullXtX.data(i * rank + i) += lambda
>> >>
>> >>         i += 1
>> >>
>> >>       }
>> >>
>> >>       // Solve the resulting matrix, which is symmetric and
>> >> positive-definite
>> >>
>> >>       algo match {
>> >>
>> >>         case ALSAlgo.Implicit =>
>> Solve.solvePositive(fullXtX.addi(YtY.get.value),
>> >> userXy(index)).data
>> >>
>> >>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
>> >> (index)).data
>> >>
>> >>       }
>> >>
>> >>     }
>> >>
>> >>
>> >> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <debasish.das83@gmail.com
>> >
>> >> wrote:
>> >>
>> >>> Hi Xiangrui,
>> >>>
>> >>> It's not the linear constraint, It is quadratic inequality with slack,
>> >>> first order taylor approximation of off diagonal cross terms and a
>> cyclic
>> >>> coordinate descent, which we think will yield orthogonality....It's
>> still
>> >>> under works...
>> >>>
>> >>> Also we want to put a L1 constraint as set of linear equations when
>> >>> solving for ALS...
>> >>>
>> >>> I will create the JIRA...as I see it, this will evolve to a generic
>> >>> constraint solver for machine learning problems that has a QP
>> >>> structure....ALS is one example....another example is kernel SVMs...
>> >>>
>> >>> I did not know that lgpl solver can be added to the classpath....if it
>> >>> can be then definitely we should add these in ALS.scala...
>> >>>
>> >>> Thanks.
>> >>> Deb
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com>
>> wrote:
>> >>>
>> >>>> I don't quite understand why putting linear constraints can promote
>> >>>> orthogonality. For the interfaces, if the subproblem is determined by
>> >>>> Y^T Y and Y^T b for each iteration, then the least squares solver, the
>> >>>> non-negative least squares solver, or your convex solver is simply a
>> >>>> function
>> >>>>
>> >>>> (A, b) -> x.
>> >>>>
>> >>>> You can define it as an interface, and make the solver pluggable by
>> >>>> adding a setter to ALS. If you want to use your lgpl solver, just
>> >>>> include it in the classpath. Creating two separate files still seems
>> >>>> unnecessary to me. Could you create a JIRA and we can move our
>> >>>> discussion there? Thanks!
>> >>>>
>> >>>> Best,
>> >>>> Xiangrui
>> >>>>
>> >>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <
>> debasish.das83@gmail.com>
>> >>>> wrote:
>> >>>> > Hi Xiangrui,
>> >>>> >
>> >>>> > For orthogonality properties in the factors we need a constraint
>> solver
>> >>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
>> >>>> >
>> >>>> > The interface of constraint solver is standard and I can add it in
>> >>>> mllib
>> >>>> > optimization....
>> >>>> >
>> >>>> > But I am not sure how will I call the gpl licensed ipm solver from
>> >>>> > mllib....assume the solver interface is as follows:
>> >>>> >
>> >>>> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
>> >>>> > linearInequality, bool lb, bool ub)
>> >>>> >
>> >>>> > And then I have functions to update equalities, inequalities, bounds
>> >>>> etc
>> >>>> > followed by the run which generates the solution....
>> >>>> >
>> >>>> > For l1 constraints I have to use epigraph formulation which needs a
>> >>>> > variable transformation before the solve....
>> >>>> >
>> >>>> > I was thinking that for the problems that does not need constraints
>> >>>> people
>> >>>> > will use ALS.scala and ConstrainedALS.scala will have the
>> constrained
>> >>>> > formulations....
>> >>>> >
>> >>>> > I can point you to the code once it is ready and then you can guide
>> me
>> >>>> how
>> >>>> > to refactor it to mllib als ?
>> >>>> >
>> >>>> > Thanks.
>> >>>> > Deb
>> >>>> > Hi Deb,
>> >>>> >
>> >>>> > Why do you want to make those methods public? If you only need to
>> >>>> > replace the solver for subproblems. You can try to make the solver
>> >>>> > pluggable. Now it supports least squares and non-negative least
>> >>>> > squares. You can define an interface for the subproblem solvers and
>> >>>> > maintain the IPM solver at your own code base, if the only
>> information
>> >>>> > you need is Y^T Y and Y^T b.
>> >>>> >
>> >>>> > Btw, just curious, what is the use case for quadratic constraints?
>> >>>> >
>> >>>> > Best,
>> >>>> > Xiangrui
>> >>>> >
>> >>>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <
>> debasish.das83@gmail.com
>> >>>> >
>> >>>> > wrote:
>> >>>> >> Hi,
>> >>>> >>
>> >>>> >> We are adding a constrained ALS solver in Spark to solve matrix
>> >>>> >> factorization use-cases which needs additional constraints (bounds,
>> >>>> >> equality, inequality, quadratic constraints)
>> >>>> >>
>> >>>> >> We are using a native version of a primal dual SOCP solver due to
>> its
>> >>>> > small
>> >>>> >> memory footprint and sparse ccs matrix computation it uses...The
>> >>>> solver
>> >>>> >> depends on AMD and LDL packages from Timothy Davis for sparse ccs
>> >>>> matrix
>> >>>> >> algebra (released under lgpl)...
>> >>>> >>
>> >>>> >> Due to GPL dependencies, it won't be possible to release the code
>> as
>> >>>> > Apache
>> >>>> >> license for now...If we get good results on our use-cases, we will
>> >>>> plan to
>> >>>> >> write a version in breeze/modify joptimizer for sparse ccs
>> >>>> operations...
>> >>>> >>
>> >>>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing
>> the
>> >>>> >> performance with default ALS and non-negative ALS as baseline. Plan
>> >>>> is to
>> >>>> >> release the code as GPL license for community review...I have kept
>> the
>> >>>> >> package structure as org.apache.spark.mllib.recommendation
>> >>>> >>
>> >>>> >> There are some private functions defined in ALS, which I would
>> like to
>> >>>> >> reuse....Is it possible to take the private out from the following
>> >>>> >> functions:
>> >>>> >>
>> >>>> >> 1. makeLinkRDDs
>> >>>> >> 2. makeInLinkBlock
>> >>>> >> 3. makeOutLinkBlock
>> >>>> >> 4. randomFactor
>> >>>> >> 5. unblockFactors
>> >>>> >>
>> >>>> >> I don't want to copy any code.... I can ask for a PR to make these
>> >>>> >> changes...
>> >>>> >>
>> >>>> >> Thanks.
>> >>>> >> Deb
>> >>>>
>> >>>
>> >>>
>> >>
>>

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
I got it...ALS formulation is solving the matrix completion problem....

To convert the problem to matrix factorization or take user feedback
(missing entries means the user hate the site ?), we should put 0 to the
missing entries (or may be -1)...in that case we have to use computeYtY and
accumulate over users in each block to generate full gram matrix...and
after that while computing userXy(index) we have to be careful in putting
0/-1 for rest of the features...

Is implicit feedback trying to do something like this ?

Basically I am trying to see if it is possible to cache the gram matrix and
it's cholesky factorization, and then call the QpSolver multiple times with
updated gradient term...I am expecting better runtimes than dposv when
ranks are high...

But seems like that's not possible without a broadcast step which might
kill all the runtime gain...


On Wed, Jun 11, 2014 at 12:21 AM, Xiangrui Meng <me...@gmail.com> wrote:

> For explicit feedback, ALS uses only observed ratings for computation.
> So XtXs are not the same. -Xiangrui
>
> On Tue, Jun 10, 2014 at 8:58 PM, Debasish Das <de...@gmail.com>
> wrote:
> > Sorry last one went out by mistake:
> >
> > Is not for users (0 to numUsers), fullXtX is same ? In the ALS
> formulation
> > this is W^TW or H^TH which should be same for all the users ? Why we are
> > reading userXtX(index) and adding it to fullXtX in the loop over all
> > numUsers ?
> >
> > // Solve the least-squares problem for each user and return the new
> feature
> > vectors
> >
> >     Array.range(0, numUsers).map { index =>
> >
> >       // Compute the full XtX matrix from the lower-triangular part we
> got
> > above
> >
> >       fillFullMatrix(userXtX(index), fullXtX)
> >
> >       // Add regularization
> >
> >       var i = 0
> >
> >       while (i < rank) {
> >
> >         fullXtX.data(i * rank + i) += lambda
> >
> >         i += 1
> >
> >       }
> >
> >       // Solve the resulting matrix, which is symmetric and
> > positive-definite
> >
> >       algo match {
> >
> >         case ALSAlgo.Implicit =>
> > Solve.solvePositive(fullXtX.addi(YtY.get.value),
> > userXy(index)).data
> >
> >         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
> > (index)).data
> >
> >       }
> >
> >     }
> >
> >
> > On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das <de...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I am bit confused wiht the code here:
> >>
> >> // Solve the least-squares problem for each user and return the new
> >> feature vectors
> >>
> >>     Array.range(0, numUsers).map { index =>
> >>
> >>       // Compute the full XtX matrix from the lower-triangular part we
> >> got above
> >>
> >>       fillFullMatrix(userXtX(index), fullXtX)
> >>
> >>       // Add regularization
> >>
> >>       var i = 0
> >>
> >>       while (i < rank) {
> >>
> >>         fullXtX.data(i * rank + i) += lambda
> >>
> >>         i += 1
> >>
> >>       }
> >>
> >>       // Solve the resulting matrix, which is symmetric and
> >> positive-definite
> >>
> >>       algo match {
> >>
> >>         case ALSAlgo.Implicit =>
> Solve.solvePositive(fullXtX.addi(YtY.get.value),
> >> userXy(index)).data
> >>
> >>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
> >> (index)).data
> >>
> >>       }
> >>
> >>     }
> >>
> >>
> >> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <debasish.das83@gmail.com
> >
> >> wrote:
> >>
> >>> Hi Xiangrui,
> >>>
> >>> It's not the linear constraint, It is quadratic inequality with slack,
> >>> first order taylor approximation of off diagonal cross terms and a
> cyclic
> >>> coordinate descent, which we think will yield orthogonality....It's
> still
> >>> under works...
> >>>
> >>> Also we want to put a L1 constraint as set of linear equations when
> >>> solving for ALS...
> >>>
> >>> I will create the JIRA...as I see it, this will evolve to a generic
> >>> constraint solver for machine learning problems that has a QP
> >>> structure....ALS is one example....another example is kernel SVMs...
> >>>
> >>> I did not know that lgpl solver can be added to the classpath....if it
> >>> can be then definitely we should add these in ALS.scala...
> >>>
> >>> Thanks.
> >>> Deb
> >>>
> >>>
> >>>
> >>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com>
> wrote:
> >>>
> >>>> I don't quite understand why putting linear constraints can promote
> >>>> orthogonality. For the interfaces, if the subproblem is determined by
> >>>> Y^T Y and Y^T b for each iteration, then the least squares solver, the
> >>>> non-negative least squares solver, or your convex solver is simply a
> >>>> function
> >>>>
> >>>> (A, b) -> x.
> >>>>
> >>>> You can define it as an interface, and make the solver pluggable by
> >>>> adding a setter to ALS. If you want to use your lgpl solver, just
> >>>> include it in the classpath. Creating two separate files still seems
> >>>> unnecessary to me. Could you create a JIRA and we can move our
> >>>> discussion there? Thanks!
> >>>>
> >>>> Best,
> >>>> Xiangrui
> >>>>
> >>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <
> debasish.das83@gmail.com>
> >>>> wrote:
> >>>> > Hi Xiangrui,
> >>>> >
> >>>> > For orthogonality properties in the factors we need a constraint
> solver
> >>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
> >>>> >
> >>>> > The interface of constraint solver is standard and I can add it in
> >>>> mllib
> >>>> > optimization....
> >>>> >
> >>>> > But I am not sure how will I call the gpl licensed ipm solver from
> >>>> > mllib....assume the solver interface is as follows:
> >>>> >
> >>>> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
> >>>> > linearInequality, bool lb, bool ub)
> >>>> >
> >>>> > And then I have functions to update equalities, inequalities, bounds
> >>>> etc
> >>>> > followed by the run which generates the solution....
> >>>> >
> >>>> > For l1 constraints I have to use epigraph formulation which needs a
> >>>> > variable transformation before the solve....
> >>>> >
> >>>> > I was thinking that for the problems that does not need constraints
> >>>> people
> >>>> > will use ALS.scala and ConstrainedALS.scala will have the
> constrained
> >>>> > formulations....
> >>>> >
> >>>> > I can point you to the code once it is ready and then you can guide
> me
> >>>> how
> >>>> > to refactor it to mllib als ?
> >>>> >
> >>>> > Thanks.
> >>>> > Deb
> >>>> > Hi Deb,
> >>>> >
> >>>> > Why do you want to make those methods public? If you only need to
> >>>> > replace the solver for subproblems. You can try to make the solver
> >>>> > pluggable. Now it supports least squares and non-negative least
> >>>> > squares. You can define an interface for the subproblem solvers and
> >>>> > maintain the IPM solver at your own code base, if the only
> information
> >>>> > you need is Y^T Y and Y^T b.
> >>>> >
> >>>> > Btw, just curious, what is the use case for quadratic constraints?
> >>>> >
> >>>> > Best,
> >>>> > Xiangrui
> >>>> >
> >>>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <
> debasish.das83@gmail.com
> >>>> >
> >>>> > wrote:
> >>>> >> Hi,
> >>>> >>
> >>>> >> We are adding a constrained ALS solver in Spark to solve matrix
> >>>> >> factorization use-cases which needs additional constraints (bounds,
> >>>> >> equality, inequality, quadratic constraints)
> >>>> >>
> >>>> >> We are using a native version of a primal dual SOCP solver due to
> its
> >>>> > small
> >>>> >> memory footprint and sparse ccs matrix computation it uses...The
> >>>> solver
> >>>> >> depends on AMD and LDL packages from Timothy Davis for sparse ccs
> >>>> matrix
> >>>> >> algebra (released under lgpl)...
> >>>> >>
> >>>> >> Due to GPL dependencies, it won't be possible to release the code
> as
> >>>> > Apache
> >>>> >> license for now...If we get good results on our use-cases, we will
> >>>> plan to
> >>>> >> write a version in breeze/modify joptimizer for sparse ccs
> >>>> operations...
> >>>> >>
> >>>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing
> the
> >>>> >> performance with default ALS and non-negative ALS as baseline. Plan
> >>>> is to
> >>>> >> release the code as GPL license for community review...I have kept
> the
> >>>> >> package structure as org.apache.spark.mllib.recommendation
> >>>> >>
> >>>> >> There are some private functions defined in ALS, which I would
> like to
> >>>> >> reuse....Is it possible to take the private out from the following
> >>>> >> functions:
> >>>> >>
> >>>> >> 1. makeLinkRDDs
> >>>> >> 2. makeInLinkBlock
> >>>> >> 3. makeOutLinkBlock
> >>>> >> 4. randomFactor
> >>>> >> 5. unblockFactors
> >>>> >>
> >>>> >> I don't want to copy any code.... I can ask for a PR to make these
> >>>> >> changes...
> >>>> >>
> >>>> >> Thanks.
> >>>> >> Deb
> >>>>
> >>>
> >>>
> >>
>

Re: Constraint Solver for Spark

Posted by Xiangrui Meng <me...@gmail.com>.
For explicit feedback, ALS uses only observed ratings for computation.
So XtXs are not the same. -Xiangrui

On Tue, Jun 10, 2014 at 8:58 PM, Debasish Das <de...@gmail.com> wrote:
> Sorry last one went out by mistake:
>
> Is not for users (0 to numUsers), fullXtX is same ? In the ALS formulation
> this is W^TW or H^TH which should be same for all the users ? Why we are
> reading userXtX(index) and adding it to fullXtX in the loop over all
> numUsers ?
>
> // Solve the least-squares problem for each user and return the new feature
> vectors
>
>     Array.range(0, numUsers).map { index =>
>
>       // Compute the full XtX matrix from the lower-triangular part we got
> above
>
>       fillFullMatrix(userXtX(index), fullXtX)
>
>       // Add regularization
>
>       var i = 0
>
>       while (i < rank) {
>
>         fullXtX.data(i * rank + i) += lambda
>
>         i += 1
>
>       }
>
>       // Solve the resulting matrix, which is symmetric and
> positive-definite
>
>       algo match {
>
>         case ALSAlgo.Implicit =>
> Solve.solvePositive(fullXtX.addi(YtY.get.value),
> userXy(index)).data
>
>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
> (index)).data
>
>       }
>
>     }
>
>
> On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das <de...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am bit confused wiht the code here:
>>
>> // Solve the least-squares problem for each user and return the new
>> feature vectors
>>
>>     Array.range(0, numUsers).map { index =>
>>
>>       // Compute the full XtX matrix from the lower-triangular part we
>> got above
>>
>>       fillFullMatrix(userXtX(index), fullXtX)
>>
>>       // Add regularization
>>
>>       var i = 0
>>
>>       while (i < rank) {
>>
>>         fullXtX.data(i * rank + i) += lambda
>>
>>         i += 1
>>
>>       }
>>
>>       // Solve the resulting matrix, which is symmetric and
>> positive-definite
>>
>>       algo match {
>>
>>         case ALSAlgo.Implicit => Solve.solvePositive(fullXtX.addi(YtY.get.value),
>> userXy(index)).data
>>
>>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
>> (index)).data
>>
>>       }
>>
>>     }
>>
>>
>> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <de...@gmail.com>
>> wrote:
>>
>>> Hi Xiangrui,
>>>
>>> It's not the linear constraint, It is quadratic inequality with slack,
>>> first order taylor approximation of off diagonal cross terms and a cyclic
>>> coordinate descent, which we think will yield orthogonality....It's still
>>> under works...
>>>
>>> Also we want to put a L1 constraint as set of linear equations when
>>> solving for ALS...
>>>
>>> I will create the JIRA...as I see it, this will evolve to a generic
>>> constraint solver for machine learning problems that has a QP
>>> structure....ALS is one example....another example is kernel SVMs...
>>>
>>> I did not know that lgpl solver can be added to the classpath....if it
>>> can be then definitely we should add these in ALS.scala...
>>>
>>> Thanks.
>>> Deb
>>>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com> wrote:
>>>
>>>> I don't quite understand why putting linear constraints can promote
>>>> orthogonality. For the interfaces, if the subproblem is determined by
>>>> Y^T Y and Y^T b for each iteration, then the least squares solver, the
>>>> non-negative least squares solver, or your convex solver is simply a
>>>> function
>>>>
>>>> (A, b) -> x.
>>>>
>>>> You can define it as an interface, and make the solver pluggable by
>>>> adding a setter to ALS. If you want to use your lgpl solver, just
>>>> include it in the classpath. Creating two separate files still seems
>>>> unnecessary to me. Could you create a JIRA and we can move our
>>>> discussion there? Thanks!
>>>>
>>>> Best,
>>>> Xiangrui
>>>>
>>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <de...@gmail.com>
>>>> wrote:
>>>> > Hi Xiangrui,
>>>> >
>>>> > For orthogonality properties in the factors we need a constraint solver
>>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
>>>> >
>>>> > The interface of constraint solver is standard and I can add it in
>>>> mllib
>>>> > optimization....
>>>> >
>>>> > But I am not sure how will I call the gpl licensed ipm solver from
>>>> > mllib....assume the solver interface is as follows:
>>>> >
>>>> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
>>>> > linearInequality, bool lb, bool ub)
>>>> >
>>>> > And then I have functions to update equalities, inequalities, bounds
>>>> etc
>>>> > followed by the run which generates the solution....
>>>> >
>>>> > For l1 constraints I have to use epigraph formulation which needs a
>>>> > variable transformation before the solve....
>>>> >
>>>> > I was thinking that for the problems that does not need constraints
>>>> people
>>>> > will use ALS.scala and ConstrainedALS.scala will have the constrained
>>>> > formulations....
>>>> >
>>>> > I can point you to the code once it is ready and then you can guide me
>>>> how
>>>> > to refactor it to mllib als ?
>>>> >
>>>> > Thanks.
>>>> > Deb
>>>> > Hi Deb,
>>>> >
>>>> > Why do you want to make those methods public? If you only need to
>>>> > replace the solver for subproblems. You can try to make the solver
>>>> > pluggable. Now it supports least squares and non-negative least
>>>> > squares. You can define an interface for the subproblem solvers and
>>>> > maintain the IPM solver at your own code base, if the only information
>>>> > you need is Y^T Y and Y^T b.
>>>> >
>>>> > Btw, just curious, what is the use case for quadratic constraints?
>>>> >
>>>> > Best,
>>>> > Xiangrui
>>>> >
>>>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <debasish.das83@gmail.com
>>>> >
>>>> > wrote:
>>>> >> Hi,
>>>> >>
>>>> >> We are adding a constrained ALS solver in Spark to solve matrix
>>>> >> factorization use-cases which needs additional constraints (bounds,
>>>> >> equality, inequality, quadratic constraints)
>>>> >>
>>>> >> We are using a native version of a primal dual SOCP solver due to its
>>>> > small
>>>> >> memory footprint and sparse ccs matrix computation it uses...The
>>>> solver
>>>> >> depends on AMD and LDL packages from Timothy Davis for sparse ccs
>>>> matrix
>>>> >> algebra (released under lgpl)...
>>>> >>
>>>> >> Due to GPL dependencies, it won't be possible to release the code as
>>>> > Apache
>>>> >> license for now...If we get good results on our use-cases, we will
>>>> plan to
>>>> >> write a version in breeze/modify joptimizer for sparse ccs
>>>> operations...
>>>> >>
>>>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
>>>> >> performance with default ALS and non-negative ALS as baseline. Plan
>>>> is to
>>>> >> release the code as GPL license for community review...I have kept the
>>>> >> package structure as org.apache.spark.mllib.recommendation
>>>> >>
>>>> >> There are some private functions defined in ALS, which I would like to
>>>> >> reuse....Is it possible to take the private out from the following
>>>> >> functions:
>>>> >>
>>>> >> 1. makeLinkRDDs
>>>> >> 2. makeInLinkBlock
>>>> >> 3. makeOutLinkBlock
>>>> >> 4. randomFactor
>>>> >> 5. unblockFactors
>>>> >>
>>>> >> I don't want to copy any code.... I can ask for a PR to make these
>>>> >> changes...
>>>> >>
>>>> >> Thanks.
>>>> >> Deb
>>>>
>>>
>>>
>>

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
Sorry last one went out by mistake:

Is not for users (0 to numUsers), fullXtX is same ? In the ALS formulation
this is W^TW or H^TH which should be same for all the users ? Why we are
reading userXtX(index) and adding it to fullXtX in the loop over all
numUsers ?

// Solve the least-squares problem for each user and return the new feature
vectors

    Array.range(0, numUsers).map { index =>

      // Compute the full XtX matrix from the lower-triangular part we got
above

      fillFullMatrix(userXtX(index), fullXtX)

      // Add regularization

      var i = 0

      while (i < rank) {

        fullXtX.data(i * rank + i) += lambda

        i += 1

      }

      // Solve the resulting matrix, which is symmetric and
positive-definite

      algo match {

        case ALSAlgo.Implicit =>
Solve.solvePositive(fullXtX.addi(YtY.get.value),
userXy(index)).data

        case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
(index)).data

      }

    }


On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das <de...@gmail.com>
wrote:

> Hi,
>
> I am bit confused wiht the code here:
>
> // Solve the least-squares problem for each user and return the new
> feature vectors
>
>     Array.range(0, numUsers).map { index =>
>
>       // Compute the full XtX matrix from the lower-triangular part we
> got above
>
>       fillFullMatrix(userXtX(index), fullXtX)
>
>       // Add regularization
>
>       var i = 0
>
>       while (i < rank) {
>
>         fullXtX.data(i * rank + i) += lambda
>
>         i += 1
>
>       }
>
>       // Solve the resulting matrix, which is symmetric and
> positive-definite
>
>       algo match {
>
>         case ALSAlgo.Implicit => Solve.solvePositive(fullXtX.addi(YtY.get.value),
> userXy(index)).data
>
>         case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
> (index)).data
>
>       }
>
>     }
>
>
> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <de...@gmail.com>
> wrote:
>
>> Hi Xiangrui,
>>
>> It's not the linear constraint, It is quadratic inequality with slack,
>> first order taylor approximation of off diagonal cross terms and a cyclic
>> coordinate descent, which we think will yield orthogonality....It's still
>> under works...
>>
>> Also we want to put a L1 constraint as set of linear equations when
>> solving for ALS...
>>
>> I will create the JIRA...as I see it, this will evolve to a generic
>> constraint solver for machine learning problems that has a QP
>> structure....ALS is one example....another example is kernel SVMs...
>>
>> I did not know that lgpl solver can be added to the classpath....if it
>> can be then definitely we should add these in ALS.scala...
>>
>> Thanks.
>> Deb
>>
>>
>>
>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com> wrote:
>>
>>> I don't quite understand why putting linear constraints can promote
>>> orthogonality. For the interfaces, if the subproblem is determined by
>>> Y^T Y and Y^T b for each iteration, then the least squares solver, the
>>> non-negative least squares solver, or your convex solver is simply a
>>> function
>>>
>>> (A, b) -> x.
>>>
>>> You can define it as an interface, and make the solver pluggable by
>>> adding a setter to ALS. If you want to use your lgpl solver, just
>>> include it in the classpath. Creating two separate files still seems
>>> unnecessary to me. Could you create a JIRA and we can move our
>>> discussion there? Thanks!
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <de...@gmail.com>
>>> wrote:
>>> > Hi Xiangrui,
>>> >
>>> > For orthogonality properties in the factors we need a constraint solver
>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
>>> >
>>> > The interface of constraint solver is standard and I can add it in
>>> mllib
>>> > optimization....
>>> >
>>> > But I am not sure how will I call the gpl licensed ipm solver from
>>> > mllib....assume the solver interface is as follows:
>>> >
>>> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
>>> > linearInequality, bool lb, bool ub)
>>> >
>>> > And then I have functions to update equalities, inequalities, bounds
>>> etc
>>> > followed by the run which generates the solution....
>>> >
>>> > For l1 constraints I have to use epigraph formulation which needs a
>>> > variable transformation before the solve....
>>> >
>>> > I was thinking that for the problems that does not need constraints
>>> people
>>> > will use ALS.scala and ConstrainedALS.scala will have the constrained
>>> > formulations....
>>> >
>>> > I can point you to the code once it is ready and then you can guide me
>>> how
>>> > to refactor it to mllib als ?
>>> >
>>> > Thanks.
>>> > Deb
>>> > Hi Deb,
>>> >
>>> > Why do you want to make those methods public? If you only need to
>>> > replace the solver for subproblems. You can try to make the solver
>>> > pluggable. Now it supports least squares and non-negative least
>>> > squares. You can define an interface for the subproblem solvers and
>>> > maintain the IPM solver at your own code base, if the only information
>>> > you need is Y^T Y and Y^T b.
>>> >
>>> > Btw, just curious, what is the use case for quadratic constraints?
>>> >
>>> > Best,
>>> > Xiangrui
>>> >
>>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <debasish.das83@gmail.com
>>> >
>>> > wrote:
>>> >> Hi,
>>> >>
>>> >> We are adding a constrained ALS solver in Spark to solve matrix
>>> >> factorization use-cases which needs additional constraints (bounds,
>>> >> equality, inequality, quadratic constraints)
>>> >>
>>> >> We are using a native version of a primal dual SOCP solver due to its
>>> > small
>>> >> memory footprint and sparse ccs matrix computation it uses...The
>>> solver
>>> >> depends on AMD and LDL packages from Timothy Davis for sparse ccs
>>> matrix
>>> >> algebra (released under lgpl)...
>>> >>
>>> >> Due to GPL dependencies, it won't be possible to release the code as
>>> > Apache
>>> >> license for now...If we get good results on our use-cases, we will
>>> plan to
>>> >> write a version in breeze/modify joptimizer for sparse ccs
>>> operations...
>>> >>
>>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
>>> >> performance with default ALS and non-negative ALS as baseline. Plan
>>> is to
>>> >> release the code as GPL license for community review...I have kept the
>>> >> package structure as org.apache.spark.mllib.recommendation
>>> >>
>>> >> There are some private functions defined in ALS, which I would like to
>>> >> reuse....Is it possible to take the private out from the following
>>> >> functions:
>>> >>
>>> >> 1. makeLinkRDDs
>>> >> 2. makeInLinkBlock
>>> >> 3. makeOutLinkBlock
>>> >> 4. randomFactor
>>> >> 5. unblockFactors
>>> >>
>>> >> I don't want to copy any code.... I can ask for a PR to make these
>>> >> changes...
>>> >>
>>> >> Thanks.
>>> >> Deb
>>>
>>
>>
>

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
Hi,

I am bit confused wiht the code here:

// Solve the least-squares problem for each user and return the new feature
vectors

    Array.range(0, numUsers).map { index =>

      // Compute the full XtX matrix from the lower-triangular part we got
above

      fillFullMatrix(userXtX(index), fullXtX)

      // Add regularization

      var i = 0

      while (i < rank) {

        fullXtX.data(i * rank + i) += lambda

        i += 1

      }

      // Solve the resulting matrix, which is symmetric and
positive-definite

      algo match {

        case ALSAlgo.Implicit =>
Solve.solvePositive(fullXtX.addi(YtY.get.value),
userXy(index)).data

        case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
(index)).data

      }

    }


On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das <de...@gmail.com>
wrote:

> Hi Xiangrui,
>
> It's not the linear constraint, It is quadratic inequality with slack,
> first order taylor approximation of off diagonal cross terms and a cyclic
> coordinate descent, which we think will yield orthogonality....It's still
> under works...
>
> Also we want to put a L1 constraint as set of linear equations when
> solving for ALS...
>
> I will create the JIRA...as I see it, this will evolve to a generic
> constraint solver for machine learning problems that has a QP
> structure....ALS is one example....another example is kernel SVMs...
>
> I did not know that lgpl solver can be added to the classpath....if it can
> be then definitely we should add these in ALS.scala...
>
> Thanks.
> Deb
>
>
>
> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> I don't quite understand why putting linear constraints can promote
>> orthogonality. For the interfaces, if the subproblem is determined by
>> Y^T Y and Y^T b for each iteration, then the least squares solver, the
>> non-negative least squares solver, or your convex solver is simply a
>> function
>>
>> (A, b) -> x.
>>
>> You can define it as an interface, and make the solver pluggable by
>> adding a setter to ALS. If you want to use your lgpl solver, just
>> include it in the classpath. Creating two separate files still seems
>> unnecessary to me. Could you create a JIRA and we can move our
>> discussion there? Thanks!
>>
>> Best,
>> Xiangrui
>>
>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <de...@gmail.com>
>> wrote:
>> > Hi Xiangrui,
>> >
>> > For orthogonality properties in the factors we need a constraint solver
>> > other than the usuals (l1, upper and lower bounds, l2 etc)
>> >
>> > The interface of constraint solver is standard and I can add it in mllib
>> > optimization....
>> >
>> > But I am not sure how will I call the gpl licensed ipm solver from
>> > mllib....assume the solver interface is as follows:
>> >
>> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
>> > linearInequality, bool lb, bool ub)
>> >
>> > And then I have functions to update equalities, inequalities, bounds etc
>> > followed by the run which generates the solution....
>> >
>> > For l1 constraints I have to use epigraph formulation which needs a
>> > variable transformation before the solve....
>> >
>> > I was thinking that for the problems that does not need constraints
>> people
>> > will use ALS.scala and ConstrainedALS.scala will have the constrained
>> > formulations....
>> >
>> > I can point you to the code once it is ready and then you can guide me
>> how
>> > to refactor it to mllib als ?
>> >
>> > Thanks.
>> > Deb
>> > Hi Deb,
>> >
>> > Why do you want to make those methods public? If you only need to
>> > replace the solver for subproblems. You can try to make the solver
>> > pluggable. Now it supports least squares and non-negative least
>> > squares. You can define an interface for the subproblem solvers and
>> > maintain the IPM solver at your own code base, if the only information
>> > you need is Y^T Y and Y^T b.
>> >
>> > Btw, just curious, what is the use case for quadratic constraints?
>> >
>> > Best,
>> > Xiangrui
>> >
>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <de...@gmail.com>
>> > wrote:
>> >> Hi,
>> >>
>> >> We are adding a constrained ALS solver in Spark to solve matrix
>> >> factorization use-cases which needs additional constraints (bounds,
>> >> equality, inequality, quadratic constraints)
>> >>
>> >> We are using a native version of a primal dual SOCP solver due to its
>> > small
>> >> memory footprint and sparse ccs matrix computation it uses...The solver
>> >> depends on AMD and LDL packages from Timothy Davis for sparse ccs
>> matrix
>> >> algebra (released under lgpl)...
>> >>
>> >> Due to GPL dependencies, it won't be possible to release the code as
>> > Apache
>> >> license for now...If we get good results on our use-cases, we will
>> plan to
>> >> write a version in breeze/modify joptimizer for sparse ccs
>> operations...
>> >>
>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
>> >> performance with default ALS and non-negative ALS as baseline. Plan is
>> to
>> >> release the code as GPL license for community review...I have kept the
>> >> package structure as org.apache.spark.mllib.recommendation
>> >>
>> >> There are some private functions defined in ALS, which I would like to
>> >> reuse....Is it possible to take the private out from the following
>> >> functions:
>> >>
>> >> 1. makeLinkRDDs
>> >> 2. makeInLinkBlock
>> >> 3. makeOutLinkBlock
>> >> 4. randomFactor
>> >> 5. unblockFactors
>> >>
>> >> I don't want to copy any code.... I can ask for a PR to make these
>> >> changes...
>> >>
>> >> Thanks.
>> >> Deb
>>
>
>

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
Hi Xiangrui,

It's not the linear constraint, It is quadratic inequality with slack,
first order taylor approximation of off diagonal cross terms and a cyclic
coordinate descent, which we think will yield orthogonality....It's still
under works...

Also we want to put a L1 constraint as set of linear equations when solving
for ALS...

I will create the JIRA...as I see it, this will evolve to a generic
constraint solver for machine learning problems that has a QP
structure....ALS is one example....another example is kernel SVMs...

I did not know that lgpl solver can be added to the classpath....if it can
be then definitely we should add these in ALS.scala...

Thanks.
Deb



On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng <me...@gmail.com> wrote:

> I don't quite understand why putting linear constraints can promote
> orthogonality. For the interfaces, if the subproblem is determined by
> Y^T Y and Y^T b for each iteration, then the least squares solver, the
> non-negative least squares solver, or your convex solver is simply a
> function
>
> (A, b) -> x.
>
> You can define it as an interface, and make the solver pluggable by
> adding a setter to ALS. If you want to use your lgpl solver, just
> include it in the classpath. Creating two separate files still seems
> unnecessary to me. Could you create a JIRA and we can move our
> discussion there? Thanks!
>
> Best,
> Xiangrui
>
> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <de...@gmail.com>
> wrote:
> > Hi Xiangrui,
> >
> > For orthogonality properties in the factors we need a constraint solver
> > other than the usuals (l1, upper and lower bounds, l2 etc)
> >
> > The interface of constraint solver is standard and I can add it in mllib
> > optimization....
> >
> > But I am not sure how will I call the gpl licensed ipm solver from
> > mllib....assume the solver interface is as follows:
> >
> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
> > linearInequality, bool lb, bool ub)
> >
> > And then I have functions to update equalities, inequalities, bounds etc
> > followed by the run which generates the solution....
> >
> > For l1 constraints I have to use epigraph formulation which needs a
> > variable transformation before the solve....
> >
> > I was thinking that for the problems that does not need constraints
> people
> > will use ALS.scala and ConstrainedALS.scala will have the constrained
> > formulations....
> >
> > I can point you to the code once it is ready and then you can guide me
> how
> > to refactor it to mllib als ?
> >
> > Thanks.
> > Deb
> > Hi Deb,
> >
> > Why do you want to make those methods public? If you only need to
> > replace the solver for subproblems. You can try to make the solver
> > pluggable. Now it supports least squares and non-negative least
> > squares. You can define an interface for the subproblem solvers and
> > maintain the IPM solver at your own code base, if the only information
> > you need is Y^T Y and Y^T b.
> >
> > Btw, just curious, what is the use case for quadratic constraints?
> >
> > Best,
> > Xiangrui
> >
> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <de...@gmail.com>
> > wrote:
> >> Hi,
> >>
> >> We are adding a constrained ALS solver in Spark to solve matrix
> >> factorization use-cases which needs additional constraints (bounds,
> >> equality, inequality, quadratic constraints)
> >>
> >> We are using a native version of a primal dual SOCP solver due to its
> > small
> >> memory footprint and sparse ccs matrix computation it uses...The solver
> >> depends on AMD and LDL packages from Timothy Davis for sparse ccs matrix
> >> algebra (released under lgpl)...
> >>
> >> Due to GPL dependencies, it won't be possible to release the code as
> > Apache
> >> license for now...If we get good results on our use-cases, we will plan
> to
> >> write a version in breeze/modify joptimizer for sparse ccs operations...
> >>
> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
> >> performance with default ALS and non-negative ALS as baseline. Plan is
> to
> >> release the code as GPL license for community review...I have kept the
> >> package structure as org.apache.spark.mllib.recommendation
> >>
> >> There are some private functions defined in ALS, which I would like to
> >> reuse....Is it possible to take the private out from the following
> >> functions:
> >>
> >> 1. makeLinkRDDs
> >> 2. makeInLinkBlock
> >> 3. makeOutLinkBlock
> >> 4. randomFactor
> >> 5. unblockFactors
> >>
> >> I don't want to copy any code.... I can ask for a PR to make these
> >> changes...
> >>
> >> Thanks.
> >> Deb
>

Re: Constraint Solver for Spark

Posted by Xiangrui Meng <me...@gmail.com>.
I don't quite understand why putting linear constraints can promote
orthogonality. For the interfaces, if the subproblem is determined by
Y^T Y and Y^T b for each iteration, then the least squares solver, the
non-negative least squares solver, or your convex solver is simply a
function

(A, b) -> x.

You can define it as an interface, and make the solver pluggable by
adding a setter to ALS. If you want to use your lgpl solver, just
include it in the classpath. Creating two separate files still seems
unnecessary to me. Could you create a JIRA and we can move our
discussion there? Thanks!

Best,
Xiangrui

On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das <de...@gmail.com> wrote:
> Hi Xiangrui,
>
> For orthogonality properties in the factors we need a constraint solver
> other than the usuals (l1, upper and lower bounds, l2 etc)
>
> The interface of constraint solver is standard and I can add it in mllib
> optimization....
>
> But I am not sure how will I call the gpl licensed ipm solver from
> mllib....assume the solver interface is as follows:
>
> Qpsolver (densematrix h, array [double] f, int linearEquality, int
> linearInequality, bool lb, bool ub)
>
> And then I have functions to update equalities, inequalities, bounds etc
> followed by the run which generates the solution....
>
> For l1 constraints I have to use epigraph formulation which needs a
> variable transformation before the solve....
>
> I was thinking that for the problems that does not need constraints people
> will use ALS.scala and ConstrainedALS.scala will have the constrained
> formulations....
>
> I can point you to the code once it is ready and then you can guide me how
> to refactor it to mllib als ?
>
> Thanks.
> Deb
> Hi Deb,
>
> Why do you want to make those methods public? If you only need to
> replace the solver for subproblems. You can try to make the solver
> pluggable. Now it supports least squares and non-negative least
> squares. You can define an interface for the subproblem solvers and
> maintain the IPM solver at your own code base, if the only information
> you need is Y^T Y and Y^T b.
>
> Btw, just curious, what is the use case for quadratic constraints?
>
> Best,
> Xiangrui
>
> On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <de...@gmail.com>
> wrote:
>> Hi,
>>
>> We are adding a constrained ALS solver in Spark to solve matrix
>> factorization use-cases which needs additional constraints (bounds,
>> equality, inequality, quadratic constraints)
>>
>> We are using a native version of a primal dual SOCP solver due to its
> small
>> memory footprint and sparse ccs matrix computation it uses...The solver
>> depends on AMD and LDL packages from Timothy Davis for sparse ccs matrix
>> algebra (released under lgpl)...
>>
>> Due to GPL dependencies, it won't be possible to release the code as
> Apache
>> license for now...If we get good results on our use-cases, we will plan to
>> write a version in breeze/modify joptimizer for sparse ccs operations...
>>
>> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
>> performance with default ALS and non-negative ALS as baseline. Plan is to
>> release the code as GPL license for community review...I have kept the
>> package structure as org.apache.spark.mllib.recommendation
>>
>> There are some private functions defined in ALS, which I would like to
>> reuse....Is it possible to take the private out from the following
>> functions:
>>
>> 1. makeLinkRDDs
>> 2. makeInLinkBlock
>> 3. makeOutLinkBlock
>> 4. randomFactor
>> 5. unblockFactors
>>
>> I don't want to copy any code.... I can ask for a PR to make these
>> changes...
>>
>> Thanks.
>> Deb

Re: Constraint Solver for Spark

Posted by Debasish Das <de...@gmail.com>.
Hi Xiangrui,

For orthogonality properties in the factors we need a constraint solver
other than the usuals (l1, upper and lower bounds, l2 etc)

The interface of constraint solver is standard and I can add it in mllib
optimization....

But I am not sure how will I call the gpl licensed ipm solver from
mllib....assume the solver interface is as follows:

Qpsolver (densematrix h, array [double] f, int linearEquality, int
linearInequality, bool lb, bool ub)

And then I have functions to update equalities, inequalities, bounds etc
followed by the run which generates the solution....

For l1 constraints I have to use epigraph formulation which needs a
variable transformation before the solve....

I was thinking that for the problems that does not need constraints people
will use ALS.scala and ConstrainedALS.scala will have the constrained
formulations....

I can point you to the code once it is ready and then you can guide me how
to refactor it to mllib als ?

Thanks.
Deb
Hi Deb,

Why do you want to make those methods public? If you only need to
replace the solver for subproblems. You can try to make the solver
pluggable. Now it supports least squares and non-negative least
squares. You can define an interface for the subproblem solvers and
maintain the IPM solver at your own code base, if the only information
you need is Y^T Y and Y^T b.

Btw, just curious, what is the use case for quadratic constraints?

Best,
Xiangrui

On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <de...@gmail.com>
wrote:
> Hi,
>
> We are adding a constrained ALS solver in Spark to solve matrix
> factorization use-cases which needs additional constraints (bounds,
> equality, inequality, quadratic constraints)
>
> We are using a native version of a primal dual SOCP solver due to its
small
> memory footprint and sparse ccs matrix computation it uses...The solver
> depends on AMD and LDL packages from Timothy Davis for sparse ccs matrix
> algebra (released under lgpl)...
>
> Due to GPL dependencies, it won't be possible to release the code as
Apache
> license for now...If we get good results on our use-cases, we will plan to
> write a version in breeze/modify joptimizer for sparse ccs operations...
>
> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
> performance with default ALS and non-negative ALS as baseline. Plan is to
> release the code as GPL license for community review...I have kept the
> package structure as org.apache.spark.mllib.recommendation
>
> There are some private functions defined in ALS, which I would like to
> reuse....Is it possible to take the private out from the following
> functions:
>
> 1. makeLinkRDDs
> 2. makeInLinkBlock
> 3. makeOutLinkBlock
> 4. randomFactor
> 5. unblockFactors
>
> I don't want to copy any code.... I can ask for a PR to make these
> changes...
>
> Thanks.
> Deb

Re: Constraint Solver for Spark

Posted by Xiangrui Meng <me...@gmail.com>.
Hi Deb,

Why do you want to make those methods public? If you only need to
replace the solver for subproblems. You can try to make the solver
pluggable. Now it supports least squares and non-negative least
squares. You can define an interface for the subproblem solvers and
maintain the IPM solver at your own code base, if the only information
you need is Y^T Y and Y^T b.

Btw, just curious, what is the use case for quadratic constraints?

Best,
Xiangrui

On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das <de...@gmail.com> wrote:
> Hi,
>
> We are adding a constrained ALS solver in Spark to solve matrix
> factorization use-cases which needs additional constraints (bounds,
> equality, inequality, quadratic constraints)
>
> We are using a native version of a primal dual SOCP solver due to its small
> memory footprint and sparse ccs matrix computation it uses...The solver
> depends on AMD and LDL packages from Timothy Davis for sparse ccs matrix
> algebra (released under lgpl)...
>
> Due to GPL dependencies, it won't be possible to release the code as Apache
> license for now...If we get good results on our use-cases, we will plan to
> write a version in breeze/modify joptimizer for sparse ccs operations...
>
> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
> performance with default ALS and non-negative ALS as baseline. Plan is to
> release the code as GPL license for community review...I have kept the
> package structure as org.apache.spark.mllib.recommendation
>
> There are some private functions defined in ALS, which I would like to
> reuse....Is it possible to take the private out from the following
> functions:
>
> 1. makeLinkRDDs
> 2. makeInLinkBlock
> 3. makeOutLinkBlock
> 4. randomFactor
> 5. unblockFactors
>
> I don't want to copy any code.... I can ask for a PR to make these
> changes...
>
> Thanks.
> Deb