You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Ulanov, Alexander" <al...@hp.com> on 2015/03/19 04:09:17 UTC

Which linear algebra interface to use within Spark MLlib?

Hi,

Currently I am using Breeze within Spark MLlib for linear algebra. I would like to reuse previously allocated matrices for storing the result of matrices multiplication, i.e. I need to use "gemm" function C:=q*A*B+p*C, which is missing in Breeze (Breeze automatically allocates a new matrix to store the result of multiplication). Also, I would like to minimize gemm calls that Breeze does. Should I use mllib.linalg.BLAS functions instead? While it has gemm and axpy, it has rather limited number of operations. For example, I need sum of the matrix by row or by columns, or applying a function to all elements in a matrix. Also, MLlib Vector and Matrix interfaces that linalg.BLAS operates seems to be rather undeveloped. Should I use plain netlib-java instead (will it remain in MLlib in future releases)?

Best regards, Alexander

Re: Which linear algebra interface to use within Spark MLlib?

Posted by Debasish Das <de...@gmail.com>.
Hi Burak,

For local linear algebra package why are we not extending breeze ?

Breeze is a mllib dependency...Also that way the local linear algebra
package will be used by other scala based frontend APIs as well that do not
necessarily pull in Spark dependencies...

Thanks.
Deb


On Fri, Mar 20, 2015 at 6:54 PM, Burak Yavuz <br...@gmail.com> wrote:

> Hi,
>
> We plan to add a more comprehensive local linear algebra package for MLlib
> 1.4. This local linear algebra package can then easily be extended to
> BlockMatrix to support the same operations in a distributed fashion.
>
> You may find the JIRA to track this here: SPARK-6442
> <https://issues.apache.org/jira/browse/SPARK-6442>
>
> The design doc is here: http://goo.gl/sf5LCE
>
> We would very much appreciate your feedback and input.
>
> Best,
> Burak
>
> On Thu, Mar 19, 2015 at 3:06 PM, Debasish Das <de...@gmail.com>
> wrote:
>
>> Yeah it will be better if we consolidate the development on one of
>> them...either Breeze or mllib.BLAS...
>>
>> On Thu, Mar 19, 2015 at 2:25 PM, Ulanov, Alexander <
>> alexander.ulanov@hp.com>
>> wrote:
>>
>> >  Thanks for quick response.
>> >
>> >  I can use linealg.BLAS.gemm, and this means that I have to use MLlib
>> > Matrix. The latter does not support some useful functionality needed for
>> > optimization. For example, creation of Matrix given matrix size, array
>> and
>> > offset in this array. This means that I will need to create matrix in
>> > Breeze and convert it to MLlib. Also, linalg.BLAS misses some useful
>> BLAS
>> > functions I need, that can be found in Breeze (and netlib-java). The
>> same
>> > concerns are applicable to MLlib Vector.
>> >
>> > Best regards, Alexander
>> >
>> > 19.03.2015, в 14:16, "Debasish Das" <de...@gmail.com>
>> написал(а):
>> >
>> >   I think for Breeze we are focused on dot and dgemv right now (along
>> > with several other matrix vector style operations)...
>> >
>> >  For dgemm it is tricky since you need to do add dgemm for both
>> > DenseMatrix and CSCMatrix...and for CSCMatrix you need to get something
>> > like SuiteSparse which is under lgpl...so we have to think more on it..
>> >
>> >  For now can't you use dgemm directly from mllib.linalg.BLAS ? It's in
>> > master...
>> >
>> >
>> > On Thu, Mar 19, 2015 at 1:49 PM, Ulanov, Alexander <
>> > alexander.ulanov@hp.com> wrote:
>> >
>> >>  Thank you! When do you expect to have gemm in Breeze and that version
>> >> of Breeze to ship with MLlib?
>> >>
>> >>  Also, could someone please elaborate on the linalg.BLAS and Matrix?
>> Are
>> >> they going to be developed further, should in long term all developers
>> use
>> >> them?
>> >>
>> >> Best regards, Alexander
>> >>
>> >> 18.03.2015, в 23:21, "Debasish Das" <de...@gmail.com>
>> >> написал(а):
>> >>
>> >>    dgemm dgemv and dot come to Breeze and Spark through netlib-java....
>> >>
>> >>  Right now both in dot and dgemv Breeze does a extra memory allocate
>> but
>> >> we already found the issue and we are working on adding a common trait
>> that
>> >> will provide a sink operation (basically memory will be allocated by
>> >> user)...adding more BLAS operators in breeze will also help in general
>> as
>> >> lot more operations are defined over there...
>> >>
>> >>
>> >> On Wed, Mar 18, 2015 at 8:09 PM, Ulanov, Alexander <
>> >> alexander.ulanov@hp.com> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Currently I am using Breeze within Spark MLlib for linear algebra. I
>> >>> would like to reuse previously allocated matrices for storing the
>> result of
>> >>> matrices multiplication, i.e. I need to use "gemm" function
>> C:=q*A*B+p*C,
>> >>> which is missing in Breeze (Breeze automatically allocates a new
>> matrix to
>> >>> store the result of multiplication). Also, I would like to minimize
>> gemm
>> >>> calls that Breeze does. Should I use mllib.linalg.BLAS functions
>> instead?
>> >>> While it has gemm and axpy, it has rather limited number of
>> operations. For
>> >>> example, I need sum of the matrix by row or by columns, or applying a
>> >>> function to all elements in a matrix. Also, MLlib Vector and Matrix
>> >>> interfaces that linalg.BLAS operates seems to be rather undeveloped.
>> Should
>> >>> I use plain netlib-java instead (will it remain in MLlib in future
>> >>> releases)?
>> >>>
>> >>> Best regards, Alexander
>> >>>
>> >>
>> >>
>> >
>>
>
>

Re: Which linear algebra interface to use within Spark MLlib?

Posted by Burak Yavuz <br...@gmail.com>.
Hi,

We plan to add a more comprehensive local linear algebra package for MLlib
1.4. This local linear algebra package can then easily be extended to
BlockMatrix to support the same operations in a distributed fashion.

You may find the JIRA to track this here: SPARK-6442
<https://issues.apache.org/jira/browse/SPARK-6442>

The design doc is here: http://goo.gl/sf5LCE

We would very much appreciate your feedback and input.

Best,
Burak

On Thu, Mar 19, 2015 at 3:06 PM, Debasish Das <de...@gmail.com>
wrote:

> Yeah it will be better if we consolidate the development on one of
> them...either Breeze or mllib.BLAS...
>
> On Thu, Mar 19, 2015 at 2:25 PM, Ulanov, Alexander <
> alexander.ulanov@hp.com>
> wrote:
>
> >  Thanks for quick response.
> >
> >  I can use linealg.BLAS.gemm, and this means that I have to use MLlib
> > Matrix. The latter does not support some useful functionality needed for
> > optimization. For example, creation of Matrix given matrix size, array
> and
> > offset in this array. This means that I will need to create matrix in
> > Breeze and convert it to MLlib. Also, linalg.BLAS misses some useful BLAS
> > functions I need, that can be found in Breeze (and netlib-java). The same
> > concerns are applicable to MLlib Vector.
> >
> > Best regards, Alexander
> >
> > 19.03.2015, в 14:16, "Debasish Das" <de...@gmail.com>
> написал(а):
> >
> >   I think for Breeze we are focused on dot and dgemv right now (along
> > with several other matrix vector style operations)...
> >
> >  For dgemm it is tricky since you need to do add dgemm for both
> > DenseMatrix and CSCMatrix...and for CSCMatrix you need to get something
> > like SuiteSparse which is under lgpl...so we have to think more on it..
> >
> >  For now can't you use dgemm directly from mllib.linalg.BLAS ? It's in
> > master...
> >
> >
> > On Thu, Mar 19, 2015 at 1:49 PM, Ulanov, Alexander <
> > alexander.ulanov@hp.com> wrote:
> >
> >>  Thank you! When do you expect to have gemm in Breeze and that version
> >> of Breeze to ship with MLlib?
> >>
> >>  Also, could someone please elaborate on the linalg.BLAS and Matrix? Are
> >> they going to be developed further, should in long term all developers
> use
> >> them?
> >>
> >> Best regards, Alexander
> >>
> >> 18.03.2015, в 23:21, "Debasish Das" <de...@gmail.com>
> >> написал(а):
> >>
> >>    dgemm dgemv and dot come to Breeze and Spark through netlib-java....
> >>
> >>  Right now both in dot and dgemv Breeze does a extra memory allocate but
> >> we already found the issue and we are working on adding a common trait
> that
> >> will provide a sink operation (basically memory will be allocated by
> >> user)...adding more BLAS operators in breeze will also help in general
> as
> >> lot more operations are defined over there...
> >>
> >>
> >> On Wed, Mar 18, 2015 at 8:09 PM, Ulanov, Alexander <
> >> alexander.ulanov@hp.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Currently I am using Breeze within Spark MLlib for linear algebra. I
> >>> would like to reuse previously allocated matrices for storing the
> result of
> >>> matrices multiplication, i.e. I need to use "gemm" function
> C:=q*A*B+p*C,
> >>> which is missing in Breeze (Breeze automatically allocates a new
> matrix to
> >>> store the result of multiplication). Also, I would like to minimize
> gemm
> >>> calls that Breeze does. Should I use mllib.linalg.BLAS functions
> instead?
> >>> While it has gemm and axpy, it has rather limited number of
> operations. For
> >>> example, I need sum of the matrix by row or by columns, or applying a
> >>> function to all elements in a matrix. Also, MLlib Vector and Matrix
> >>> interfaces that linalg.BLAS operates seems to be rather undeveloped.
> Should
> >>> I use plain netlib-java instead (will it remain in MLlib in future
> >>> releases)?
> >>>
> >>> Best regards, Alexander
> >>>
> >>
> >>
> >
>

Re: Which linear algebra interface to use within Spark MLlib?

Posted by Debasish Das <de...@gmail.com>.
Yeah it will be better if we consolidate the development on one of
them...either Breeze or mllib.BLAS...

On Thu, Mar 19, 2015 at 2:25 PM, Ulanov, Alexander <al...@hp.com>
wrote:

>  Thanks for quick response.
>
>  I can use linealg.BLAS.gemm, and this means that I have to use MLlib
> Matrix. The latter does not support some useful functionality needed for
> optimization. For example, creation of Matrix given matrix size, array and
> offset in this array. This means that I will need to create matrix in
> Breeze and convert it to MLlib. Also, linalg.BLAS misses some useful BLAS
> functions I need, that can be found in Breeze (and netlib-java). The same
> concerns are applicable to MLlib Vector.
>
> Best regards, Alexander
>
> 19.03.2015, в 14:16, "Debasish Das" <de...@gmail.com> написал(а):
>
>   I think for Breeze we are focused on dot and dgemv right now (along
> with several other matrix vector style operations)...
>
>  For dgemm it is tricky since you need to do add dgemm for both
> DenseMatrix and CSCMatrix...and for CSCMatrix you need to get something
> like SuiteSparse which is under lgpl...so we have to think more on it..
>
>  For now can't you use dgemm directly from mllib.linalg.BLAS ? It's in
> master...
>
>
> On Thu, Mar 19, 2015 at 1:49 PM, Ulanov, Alexander <
> alexander.ulanov@hp.com> wrote:
>
>>  Thank you! When do you expect to have gemm in Breeze and that version
>> of Breeze to ship with MLlib?
>>
>>  Also, could someone please elaborate on the linalg.BLAS and Matrix? Are
>> they going to be developed further, should in long term all developers use
>> them?
>>
>> Best regards, Alexander
>>
>> 18.03.2015, в 23:21, "Debasish Das" <de...@gmail.com>
>> написал(а):
>>
>>    dgemm dgemv and dot come to Breeze and Spark through netlib-java....
>>
>>  Right now both in dot and dgemv Breeze does a extra memory allocate but
>> we already found the issue and we are working on adding a common trait that
>> will provide a sink operation (basically memory will be allocated by
>> user)...adding more BLAS operators in breeze will also help in general as
>> lot more operations are defined over there...
>>
>>
>> On Wed, Mar 18, 2015 at 8:09 PM, Ulanov, Alexander <
>> alexander.ulanov@hp.com> wrote:
>>
>>> Hi,
>>>
>>> Currently I am using Breeze within Spark MLlib for linear algebra. I
>>> would like to reuse previously allocated matrices for storing the result of
>>> matrices multiplication, i.e. I need to use "gemm" function C:=q*A*B+p*C,
>>> which is missing in Breeze (Breeze automatically allocates a new matrix to
>>> store the result of multiplication). Also, I would like to minimize gemm
>>> calls that Breeze does. Should I use mllib.linalg.BLAS functions instead?
>>> While it has gemm and axpy, it has rather limited number of operations. For
>>> example, I need sum of the matrix by row or by columns, or applying a
>>> function to all elements in a matrix. Also, MLlib Vector and Matrix
>>> interfaces that linalg.BLAS operates seems to be rather undeveloped. Should
>>> I use plain netlib-java instead (will it remain in MLlib in future
>>> releases)?
>>>
>>> Best regards, Alexander
>>>
>>
>>
>

Re: Which linear algebra interface to use within Spark MLlib?

Posted by "Ulanov, Alexander" <al...@hp.com>.
Thanks for quick response.

I can use linealg.BLAS.gemm, and this means that I have to use MLlib Matrix. The latter does not support some useful functionality needed for optimization. For example, creation of Matrix given matrix size, array and offset in this array. This means that I will need to create matrix in Breeze and convert it to MLlib. Also, linalg.BLAS misses some useful BLAS functions I need, that can be found in Breeze (and netlib-java). The same concerns are applicable to MLlib Vector.

Best regards, Alexander

19.03.2015, в 14:16, "Debasish Das" <de...@gmail.com>> написал(а):

I think for Breeze we are focused on dot and dgemv right now (along with several other matrix vector style operations)...

For dgemm it is tricky since you need to do add dgemm for both DenseMatrix and CSCMatrix...and for CSCMatrix you need to get something like SuiteSparse which is under lgpl...so we have to think more on it..

For now can't you use dgemm directly from mllib.linalg.BLAS ? It's in master...


On Thu, Mar 19, 2015 at 1:49 PM, Ulanov, Alexander <al...@hp.com>> wrote:
Thank you! When do you expect to have gemm in Breeze and that version of Breeze to ship with MLlib?

Also, could someone please elaborate on the linalg.BLAS and Matrix? Are they going to be developed further, should in long term all developers use them?

Best regards, Alexander

18.03.2015, в 23:21, "Debasish Das" <de...@gmail.com>> написал(а):

dgemm dgemv and dot come to Breeze and Spark through netlib-java....

Right now both in dot and dgemv Breeze does a extra memory allocate but we already found the issue and we are working on adding a common trait that will provide a sink operation (basically memory will be allocated by user)...adding more BLAS operators in breeze will also help in general as lot more operations are defined over there...


On Wed, Mar 18, 2015 at 8:09 PM, Ulanov, Alexander <al...@hp.com>> wrote:
Hi,

Currently I am using Breeze within Spark MLlib for linear algebra. I would like to reuse previously allocated matrices for storing the result of matrices multiplication, i.e. I need to use "gemm" function C:=q*A*B+p*C, which is missing in Breeze (Breeze automatically allocates a new matrix to store the result of multiplication). Also, I would like to minimize gemm calls that Breeze does. Should I use mllib.linalg.BLAS functions instead? While it has gemm and axpy, it has rather limited number of operations. For example, I need sum of the matrix by row or by columns, or applying a function to all elements in a matrix. Also, MLlib Vector and Matrix interfaces that linalg.BLAS operates seems to be rather undeveloped. Should I use plain netlib-java instead (will it remain in MLlib in future releases)?

Best regards, Alexander



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Which linear algebra interface to use within Spark MLlib?

Posted by Debasish Das <de...@gmail.com>.
I think for Breeze we are focused on dot and dgemv right now (along with
several other matrix vector style operations)...

For dgemm it is tricky since you need to do add dgemm for both DenseMatrix
and CSCMatrix...and for CSCMatrix you need to get something like
SuiteSparse which is under lgpl...so we have to think more on it..

For now can't you use dgemm directly from mllib.linalg.BLAS ? It's in
master...


On Thu, Mar 19, 2015 at 1:49 PM, Ulanov, Alexander <al...@hp.com>
wrote:

>  Thank you! When do you expect to have gemm in Breeze and that version of
> Breeze to ship with MLlib?
>
>  Also, could someone please elaborate on the linalg.BLAS and Matrix? Are
> they going to be developed further, should in long term all developers use
> them?
>
> Best regards, Alexander
>
> 18.03.2015, в 23:21, "Debasish Das" <de...@gmail.com> написал(а):
>
>   dgemm dgemv and dot come to Breeze and Spark through netlib-java....
>
>  Right now both in dot and dgemv Breeze does a extra memory allocate but
> we already found the issue and we are working on adding a common trait that
> will provide a sink operation (basically memory will be allocated by
> user)...adding more BLAS operators in breeze will also help in general as
> lot more operations are defined over there...
>
>
> On Wed, Mar 18, 2015 at 8:09 PM, Ulanov, Alexander <
> alexander.ulanov@hp.com> wrote:
>
>> Hi,
>>
>> Currently I am using Breeze within Spark MLlib for linear algebra. I
>> would like to reuse previously allocated matrices for storing the result of
>> matrices multiplication, i.e. I need to use "gemm" function C:=q*A*B+p*C,
>> which is missing in Breeze (Breeze automatically allocates a new matrix to
>> store the result of multiplication). Also, I would like to minimize gemm
>> calls that Breeze does. Should I use mllib.linalg.BLAS functions instead?
>> While it has gemm and axpy, it has rather limited number of operations. For
>> example, I need sum of the matrix by row or by columns, or applying a
>> function to all elements in a matrix. Also, MLlib Vector and Matrix
>> interfaces that linalg.BLAS operates seems to be rather undeveloped. Should
>> I use plain netlib-java instead (will it remain in MLlib in future
>> releases)?
>>
>> Best regards, Alexander
>>
>
>

Re: Which linear algebra interface to use within Spark MLlib?

Posted by "Ulanov, Alexander" <al...@hp.com>.
Thank you! When do you expect to have gemm in Breeze and that version of Breeze to ship with MLlib?

Also, could someone please elaborate on the linalg.BLAS and Matrix? Are they going to be developed further, should in long term all developers use them?

Best regards, Alexander

18.03.2015, в 23:21, "Debasish Das" <de...@gmail.com>> написал(а):

dgemm dgemv and dot come to Breeze and Spark through netlib-java....

Right now both in dot and dgemv Breeze does a extra memory allocate but we already found the issue and we are working on adding a common trait that will provide a sink operation (basically memory will be allocated by user)...adding more BLAS operators in breeze will also help in general as lot more operations are defined over there...


On Wed, Mar 18, 2015 at 8:09 PM, Ulanov, Alexander <al...@hp.com>> wrote:
Hi,

Currently I am using Breeze within Spark MLlib for linear algebra. I would like to reuse previously allocated matrices for storing the result of matrices multiplication, i.e. I need to use "gemm" function C:=q*A*B+p*C, which is missing in Breeze (Breeze automatically allocates a new matrix to store the result of multiplication). Also, I would like to minimize gemm calls that Breeze does. Should I use mllib.linalg.BLAS functions instead? While it has gemm and axpy, it has rather limited number of operations. For example, I need sum of the matrix by row or by columns, or applying a function to all elements in a matrix. Also, MLlib Vector and Matrix interfaces that linalg.BLAS operates seems to be rather undeveloped. Should I use plain netlib-java instead (will it remain in MLlib in future releases)?

Best regards, Alexander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Which linear algebra interface to use within Spark MLlib?

Posted by Debasish Das <de...@gmail.com>.
dgemm dgemv and dot come to Breeze and Spark through netlib-java....

Right now both in dot and dgemv Breeze does a extra memory allocate but we
already found the issue and we are working on adding a common trait that
will provide a sink operation (basically memory will be allocated by
user)...adding more BLAS operators in breeze will also help in general as
lot more operations are defined over there...


On Wed, Mar 18, 2015 at 8:09 PM, Ulanov, Alexander <al...@hp.com>
wrote:

> Hi,
>
> Currently I am using Breeze within Spark MLlib for linear algebra. I would
> like to reuse previously allocated matrices for storing the result of
> matrices multiplication, i.e. I need to use "gemm" function C:=q*A*B+p*C,
> which is missing in Breeze (Breeze automatically allocates a new matrix to
> store the result of multiplication). Also, I would like to minimize gemm
> calls that Breeze does. Should I use mllib.linalg.BLAS functions instead?
> While it has gemm and axpy, it has rather limited number of operations. For
> example, I need sum of the matrix by row or by columns, or applying a
> function to all elements in a matrix. Also, MLlib Vector and Matrix
> interfaces that linalg.BLAS operates seems to be rather undeveloped. Should
> I use plain netlib-java instead (will it remain in MLlib in future
> releases)?
>
> Best regards, Alexander
>