You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jaonary Rabarisoa <ja...@gmail.com> on 2015/03/03 18:01:22 UTC

Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Dear all,

Is there a least square solver based on DistributedMatrix that we can use
out of the box in the current (or the master) version of spark ?
It seems that the only least square solver available in spark is private to
recommender package.


Cheers,

Jao

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Burak Yavuz <br...@gmail.com>.
Hi Jaonary,

The RowPartitionedMatrix is a special case of the BlockMatrix, where the
colsPerBlock = nCols. I hope that helps.

Burak
On Mar 6, 2015 9:13 AM, "Jaonary Rabarisoa" <ja...@gmail.com> wrote:

> Hi Shivaram,
>
> Thank you for the link. I'm trying to figure out how can I port this to
> mllib. May you can help me to understand how pieces fit together.
> Currently, in mllib there's different types of distributed matrix :
>
> BlockMatrix, CoordinateMatrix, IndexedRowMatrix and RowMatrix. Which one
> should correspond to RowPartitionedMatrix in ml-matrix ?
>
>
>
> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> There are couple of solvers that I've written that is part of the AMPLab
>> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
>> interested in porting them I'd be happy to review it
>>
>> Thanks
>> Shivaram
>>
>>
>> [1]
>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>> [2]
>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>
>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> Is there a least square solver based on DistributedMatrix that we can
>>> use out of the box in the current (or the master) version of spark ?
>>> It seems that the only least square solver available in spark is private
>>> to recommender package.
>>>
>>>
>>> Cheers,
>>>
>>> Jao
>>>
>>
>>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Jaonary Rabarisoa <ja...@gmail.com>.
Hi Shivaram,

Thank you for the link. I'm trying to figure out how can I port this to
mllib. May you can help me to understand how pieces fit together.
Currently, in mllib there's different types of distributed matrix :

BlockMatrix, CoordinateMatrix, IndexedRowMatrix and RowMatrix. Which one
should correspond to RowPartitionedMatrix in ml-matrix ?



On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> There are couple of solvers that I've written that is part of the AMPLab
> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
> interested in porting them I'd be happy to review it
>
> Thanks
> Shivaram
>
>
> [1]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
> [2]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>
> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> Is there a least square solver based on DistributedMatrix that we can use
>> out of the box in the current (or the master) version of spark ?
>> It seems that the only least square solver available in spark is private
>> to recommender package.
>>
>>
>> Cheers,
>>
>> Jao
>>
>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
Section 3, 4, 5 in http://www.netlib.org/lapack/lawnspdf/lawn204.pdf is a
good reference

Shivaram
On Mar 6, 2015 9:17 AM, "Jaonary Rabarisoa" <ja...@gmail.com> wrote:

> Do you have a reference paper to the implemented algorithm in TSQR.scala ?
>
> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> There are couple of solvers that I've written that is part of the AMPLab
>> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
>> interested in porting them I'd be happy to review it
>>
>> Thanks
>> Shivaram
>>
>>
>> [1]
>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>> [2]
>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>
>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> Is there a least square solver based on DistributedMatrix that we can
>>> use out of the box in the current (or the master) version of spark ?
>>> It seems that the only least square solver available in spark is private
>>> to recommender package.
>>>
>>>
>>> Cheers,
>>>
>>> Jao
>>>
>>
>>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Jaonary Rabarisoa <ja...@gmail.com>.
Do you have a reference paper to the implemented algorithm in TSQR.scala ?

On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> There are couple of solvers that I've written that is part of the AMPLab
> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
> interested in porting them I'd be happy to review it
>
> Thanks
> Shivaram
>
>
> [1]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
> [2]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>
> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> Is there a least square solver based on DistributedMatrix that we can use
>> out of the box in the current (or the master) version of spark ?
>> It seems that the only least square solver available in spark is private
>> to recommender package.
>>
>>
>> Cheers,
>>
>> Jao
>>
>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
Do you have a small test case that can reproduce the out of memory error ?
I have also seen some errors on large scale experiments but haven't managed
to narrow it down.

Thanks
Shivaram

On Fri, Mar 13, 2015 at 6:20 AM, Jaonary Rabarisoa <ja...@gmail.com>
wrote:

> It runs faster but there is some drawbacks. It seems to consume more
> memory. I get java.lang.OutOfMemoryError: Java heap space error if I don't
> have a sufficient partitions for a fixed amount of memory. With the older
> (ampcamp) implementation for the same data size I didn't get it.
>
> On Thu, Mar 12, 2015 at 11:36 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>>
>> On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa <ja...@gmail.com>
>> wrote:
>>
>>> In fact, by activating netlib with native libraries it goes faster.
>>>
>>> Glad you got it work ! Better performance was one of the reasons we made
>> the switch.
>>
>>> Thanks
>>>
>>> On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman <
>>> shivaram@eecs.berkeley.edu> wrote:
>>>
>>>> There are a couple of differences between the ml-matrix implementation
>>>> and the one used in AMPCamp
>>>>
>>>> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS
>>>> libraries along with it. In ml-matrix we switched to using Breeze + Netlib
>>>> BLAS which is faster but needs some setup [1] to pick up native libraries.
>>>> If native libraries are not found it falls back to a JVM implementation, so
>>>> that might explain the slow down.
>>>>
>>>> - The other difference if you are comparing the whole image pipeline is
>>>> that I think the AMPCamp version used NormalEquations which is around 2-3x
>>>> faster (just in terms of number of flops) compared to TSQR.
>>>>
>>>> [1]
>>>> https://github.com/fommil/netlib-java#machine-optimised-system-libraries
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm trying to play with the implementation of least square solver (Ax
>>>>> = b) in mlmatrix.TSQR where A is  a 50000*1024 matrix  and b a 50000*10
>>>>> matrix. It works but I notice
>>>>> that it's 8 times slower than the implementation given in the latest
>>>>> ampcamp :
>>>>> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
>>>>> . As far as I know these two implementations come from the same basis.
>>>>> What is the difference between these two codes ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
>>>>> shivaram@eecs.berkeley.edu> wrote:
>>>>>
>>>>>> There are couple of solvers that I've written that is part of the
>>>>>> AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if
>>>>>> you are interested in porting them I'd be happy to review it
>>>>>>
>>>>>> Thanks
>>>>>> Shivaram
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>>>>>> [2]
>>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>>>>>
>>>>>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> Is there a least square solver based on DistributedMatrix that we
>>>>>>> can use out of the box in the current (or the master) version of spark ?
>>>>>>> It seems that the only least square solver available in spark is
>>>>>>> private to recommender package.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jao
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Jaonary Rabarisoa <ja...@gmail.com>.
It runs faster but there is some drawbacks. It seems to consume more
memory. I get java.lang.OutOfMemoryError: Java heap space error if I don't
have a sufficient partitions for a fixed amount of memory. With the older
(ampcamp) implementation for the same data size I didn't get it.

On Thu, Mar 12, 2015 at 11:36 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

>
> On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa <ja...@gmail.com>
> wrote:
>
>> In fact, by activating netlib with native libraries it goes faster.
>>
>> Glad you got it work ! Better performance was one of the reasons we made
> the switch.
>
>> Thanks
>>
>> On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman <
>> shivaram@eecs.berkeley.edu> wrote:
>>
>>> There are a couple of differences between the ml-matrix implementation
>>> and the one used in AMPCamp
>>>
>>> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS
>>> libraries along with it. In ml-matrix we switched to using Breeze + Netlib
>>> BLAS which is faster but needs some setup [1] to pick up native libraries.
>>> If native libraries are not found it falls back to a JVM implementation, so
>>> that might explain the slow down.
>>>
>>> - The other difference if you are comparing the whole image pipeline is
>>> that I think the AMPCamp version used NormalEquations which is around 2-3x
>>> faster (just in terms of number of flops) compared to TSQR.
>>>
>>> [1]
>>> https://github.com/fommil/netlib-java#machine-optimised-system-libraries
>>>
>>> Thanks
>>> Shivaram
>>>
>>> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa <ja...@gmail.com>
>>> wrote:
>>>
>>>> I'm trying to play with the implementation of least square solver (Ax =
>>>> b) in mlmatrix.TSQR where A is  a 50000*1024 matrix  and b a 50000*10
>>>> matrix. It works but I notice
>>>> that it's 8 times slower than the implementation given in the latest
>>>> ampcamp :
>>>> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
>>>> . As far as I know these two implementations come from the same basis.
>>>> What is the difference between these two codes ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
>>>> shivaram@eecs.berkeley.edu> wrote:
>>>>
>>>>> There are couple of solvers that I've written that is part of the
>>>>> AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if
>>>>> you are interested in porting them I'd be happy to review it
>>>>>
>>>>> Thanks
>>>>> Shivaram
>>>>>
>>>>>
>>>>> [1]
>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>>>>> [2]
>>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>>>>
>>>>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Is there a least square solver based on DistributedMatrix that we can
>>>>>> use out of the box in the current (or the master) version of spark ?
>>>>>> It seems that the only least square solver available in spark is
>>>>>> private to recommender package.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Jao
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa <ja...@gmail.com>
wrote:

> In fact, by activating netlib with native libraries it goes faster.
>
> Glad you got it work ! Better performance was one of the reasons we made
the switch.

> Thanks
>
> On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> There are a couple of differences between the ml-matrix implementation
>> and the one used in AMPCamp
>>
>> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS
>> libraries along with it. In ml-matrix we switched to using Breeze + Netlib
>> BLAS which is faster but needs some setup [1] to pick up native libraries.
>> If native libraries are not found it falls back to a JVM implementation, so
>> that might explain the slow down.
>>
>> - The other difference if you are comparing the whole image pipeline is
>> that I think the AMPCamp version used NormalEquations which is around 2-3x
>> faster (just in terms of number of flops) compared to TSQR.
>>
>> [1]
>> https://github.com/fommil/netlib-java#machine-optimised-system-libraries
>>
>> Thanks
>> Shivaram
>>
>> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa <ja...@gmail.com>
>> wrote:
>>
>>> I'm trying to play with the implementation of least square solver (Ax =
>>> b) in mlmatrix.TSQR where A is  a 50000*1024 matrix  and b a 50000*10
>>> matrix. It works but I notice
>>> that it's 8 times slower than the implementation given in the latest
>>> ampcamp :
>>> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
>>> . As far as I know these two implementations come from the same basis.
>>> What is the difference between these two codes ?
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
>>> shivaram@eecs.berkeley.edu> wrote:
>>>
>>>> There are couple of solvers that I've written that is part of the
>>>> AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if
>>>> you are interested in porting them I'd be happy to review it
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>>
>>>> [1]
>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>>>> [2]
>>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>>>
>>>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> Is there a least square solver based on DistributedMatrix that we can
>>>>> use out of the box in the current (or the master) version of spark ?
>>>>> It seems that the only least square solver available in spark is
>>>>> private to recommender package.
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jao
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Jaonary Rabarisoa <ja...@gmail.com>.
In fact, by activating netlib with native libraries it goes faster.

Thanks

On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> There are a couple of differences between the ml-matrix implementation and
> the one used in AMPCamp
>
> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS
> libraries along with it. In ml-matrix we switched to using Breeze + Netlib
> BLAS which is faster but needs some setup [1] to pick up native libraries.
> If native libraries are not found it falls back to a JVM implementation, so
> that might explain the slow down.
>
> - The other difference if you are comparing the whole image pipeline is
> that I think the AMPCamp version used NormalEquations which is around 2-3x
> faster (just in terms of number of flops) compared to TSQR.
>
> [1]
> https://github.com/fommil/netlib-java#machine-optimised-system-libraries
>
> Thanks
> Shivaram
>
> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa <ja...@gmail.com>
> wrote:
>
>> I'm trying to play with the implementation of least square solver (Ax =
>> b) in mlmatrix.TSQR where A is  a 50000*1024 matrix  and b a 50000*10
>> matrix. It works but I notice
>> that it's 8 times slower than the implementation given in the latest
>> ampcamp :
>> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
>> . As far as I know these two implementations come from the same basis.
>> What is the difference between these two codes ?
>>
>>
>>
>>
>>
>> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
>> shivaram@eecs.berkeley.edu> wrote:
>>
>>> There are couple of solvers that I've written that is part of the AMPLab
>>> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
>>> interested in porting them I'd be happy to review it
>>>
>>> Thanks
>>> Shivaram
>>>
>>>
>>> [1]
>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>>> [2]
>>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>>
>>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
>>> wrote:
>>>
>>>> Dear all,
>>>>
>>>> Is there a least square solver based on DistributedMatrix that we can
>>>> use out of the box in the current (or the master) version of spark ?
>>>> It seems that the only least square solver available in spark is
>>>> private to recommender package.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Jao
>>>>
>>>
>>>
>>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
There are a couple of differences between the ml-matrix implementation and
the one used in AMPCamp

- I think the AMPCamp one uses JBLAS which tends to ship native BLAS
libraries along with it. In ml-matrix we switched to using Breeze + Netlib
BLAS which is faster but needs some setup [1] to pick up native libraries.
If native libraries are not found it falls back to a JVM implementation, so
that might explain the slow down.

- The other difference if you are comparing the whole image pipeline is
that I think the AMPCamp version used NormalEquations which is around 2-3x
faster (just in terms of number of flops) compared to TSQR.

[1] https://github.com/fommil/netlib-java#machine-optimised-system-libraries

Thanks
Shivaram

On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa <ja...@gmail.com>
wrote:

> I'm trying to play with the implementation of least square solver (Ax = b)
> in mlmatrix.TSQR where A is  a 50000*1024 matrix  and b a 50000*10 matrix.
> It works but I notice
> that it's 8 times slower than the implementation given in the latest
> ampcamp :
> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
> . As far as I know these two implementations come from the same basis.
> What is the difference between these two codes ?
>
>
>
>
>
> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> There are couple of solvers that I've written that is part of the AMPLab
>> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
>> interested in porting them I'd be happy to review it
>>
>> Thanks
>> Shivaram
>>
>>
>> [1]
>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
>> [2]
>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>>
>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> Is there a least square solver based on DistributedMatrix that we can
>>> use out of the box in the current (or the master) version of spark ?
>>> It seems that the only least square solver available in spark is private
>>> to recommender package.
>>>
>>>
>>> Cheers,
>>>
>>> Jao
>>>
>>
>>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Jaonary Rabarisoa <ja...@gmail.com>.
I'm trying to play with the implementation of least square solver (Ax = b)
in mlmatrix.TSQR where A is  a 50000*1024 matrix  and b a 50000*10 matrix.
It works but I notice
that it's 8 times slower than the implementation given in the latest
ampcamp :
http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
. As far as I know these two implementations come from the same basis.
What is the difference between these two codes ?





On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> There are couple of solvers that I've written that is part of the AMPLab
> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
> interested in porting them I'd be happy to review it
>
> Thanks
> Shivaram
>
>
> [1]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
> [2]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>
> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> Is there a least square solver based on DistributedMatrix that we can use
>> out of the box in the current (or the master) version of spark ?
>> It seems that the only least square solver available in spark is private
>> to recommender package.
>>
>>
>> Cheers,
>>
>> Jao
>>
>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Joseph Bradley <jo...@databricks.com>.
The minimization problem you're describing in the email title also looks
like it could be solved using the RidgeRegression solver in MLlib, once you
transform your DistributedMatrix into an RDD[LabeledPoint].

On Tue, Mar 3, 2015 at 11:02 AM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> There are couple of solvers that I've written that is part of the AMPLab
> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
> interested in porting them I'd be happy to review it
>
> Thanks
> Shivaram
>
>
> [1]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
> [2]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>
> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> Is there a least square solver based on DistributedMatrix that we can use
>> out of the box in the current (or the master) version of spark ?
>> It seems that the only least square solver available in spark is private
>> to recommender package.
>>
>>
>> Cheers,
>>
>> Jao
>>
>
>

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
There are couple of solvers that I've written that is part of the AMPLab
ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are
interested in porting them I'd be happy to review it

Thanks
Shivaram


[1]
https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
[2]
https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala

On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa <ja...@gmail.com> wrote:

> Dear all,
>
> Is there a least square solver based on DistributedMatrix that we can use
> out of the box in the current (or the master) version of spark ?
> It seems that the only least square solver available in spark is private
> to recommender package.
>
>
> Cheers,
>
> Jao
>