You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joseph Bradley <jo...@databricks.com> on 2015/03/01 01:25:39 UTC

Re: Is there any Sparse Matrix implementation in Spark/MLib?

Hi Shahab,

There are actually a few distributed Matrix types which support sparse
representations: RowMatrix, IndexedRowMatrix, and CoordinateMatrix.
The documentation has a bit more info about the various uses:
http://spark.apache.org/docs/latest/mllib-data-types.html#distributed-matrix

The Spark 1.3 RC includes a new one: BlockMatrix.

But since these are distributed, they are represented using RDDs, so they
of course will not be as fast as computations on smaller, locally stored
matrices.

Joseph

On Fri, Feb 27, 2015 at 4:39 AM, Ritesh Kumar Singh <
riteshoneinamillion@gmail.com> wrote:

> try using breeze (scala linear algebra library)
>
> On Fri, Feb 27, 2015 at 5:56 PM, shahab <sh...@gmail.com> wrote:
>
>> Thanks a lot Vijay, let me see how it performs.
>>
>> Best
>> Shahab
>>
>>
>> On Friday, February 27, 2015, Vijay Saraswat <vi...@saraswat.org> wrote:
>>
>>> Available in GML --
>>>
>>> http://x10-lang.org/x10-community/applications/global-
>>> matrix-library.html
>>>
>>> We are exploring how to make it available within Spark. Any ideas would
>>> be much appreciated.
>>>
>>> On 2/27/15 7:01 AM, shahab wrote:
>>>
>>>> Hi,
>>>>
>>>> I just wonder if there is any Sparse Matrix implementation available
>>>> in Spark, so it can be used in spark application?
>>>>
>>>> best,
>>>> /Shahab
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>

Re: Is there any Sparse Matrix implementation in Spark/MLib?

Posted by shahab <sh...@gmail.com>.
Thanks Josef for the comments, I think I need to do some benchmarking.

best,
/Shahab

On Sun, Mar 1, 2015 at 1:25 AM, Joseph Bradley <jo...@databricks.com>
wrote:

> Hi Shahab,
>
> There are actually a few distributed Matrix types which support sparse
> representations: RowMatrix, IndexedRowMatrix, and CoordinateMatrix.
> The documentation has a bit more info about the various uses:
> http://spark.apache.org/docs/latest/mllib-data-types.html#distributed-matrix
>
> The Spark 1.3 RC includes a new one: BlockMatrix.
>
> But since these are distributed, they are represented using RDDs, so they
> of course will not be as fast as computations on smaller, locally stored
> matrices.
>
> Joseph
>
> On Fri, Feb 27, 2015 at 4:39 AM, Ritesh Kumar Singh <
> riteshoneinamillion@gmail.com> wrote:
>
>> try using breeze (scala linear algebra library)
>>
>> On Fri, Feb 27, 2015 at 5:56 PM, shahab <sh...@gmail.com> wrote:
>>
>>> Thanks a lot Vijay, let me see how it performs.
>>>
>>> Best
>>> Shahab
>>>
>>>
>>> On Friday, February 27, 2015, Vijay Saraswat <vi...@saraswat.org> wrote:
>>>
>>>> Available in GML --
>>>>
>>>> http://x10-lang.org/x10-community/applications/global-
>>>> matrix-library.html
>>>>
>>>> We are exploring how to make it available within Spark. Any ideas would
>>>> be much appreciated.
>>>>
>>>> On 2/27/15 7:01 AM, shahab wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I just wonder if there is any Sparse Matrix implementation available
>>>>> in Spark, so it can be used in spark application?
>>>>>
>>>>> best,
>>>>> /Shahab
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>
>

Re: Is there any Sparse Matrix implementation in Spark/MLib?

Posted by shahab <sh...@gmail.com>.
Thanks Vijay, but the setup requirement for GML was not straightforward for
me at all, so I put it aside for a while.

best,
/Shahab

On Sun, Mar 1, 2015 at 9:34 AM, Vijay Saraswat <vi...@saraswat.org> wrote:

>  GML is a fast, distributed, in-memory sparse (and dense) matrix
> libraries.
>
> It does not use RDDs for resilience. Instead we have examples that use
> Resilient X10 (which provides recovery of distributed control structures in
> case of node failure) and Hazelcast for stable storage.
>
> We are looking to benchmark with RDDs to compare overhead, and also
> looking to see how the same ideas could be realized on top of RDDs.
>
>
>
> On 2/28/15 7:25 PM, Joseph Bradley wrote:
>
> Hi Shahab,
>
>  There are actually a few distributed Matrix types which support sparse
> representations: RowMatrix, IndexedRowMatrix, and CoordinateMatrix.
> The documentation has a bit more info about the various uses:
> http://spark.apache.org/docs/latest/mllib-data-types.html#distributed-matrix
>
>  The Spark 1.3 RC includes a new one: BlockMatrix.
>
>  But since these are distributed, they are represented using RDDs, so
> they of course will not be as fast as computations on smaller, locally
> stored matrices.
>
>  Joseph
>
> On Fri, Feb 27, 2015 at 4:39 AM, Ritesh Kumar Singh <
> riteshoneinamillion@gmail.com> wrote:
>
>> try using breeze (scala linear algebra library)
>>
>> On Fri, Feb 27, 2015 at 5:56 PM, shahab <sh...@gmail.com> wrote:
>>
>>> Thanks a lot Vijay, let me see how it performs.
>>>
>>>  Best
>>> Shahab
>>>
>>>
>>> On Friday, February 27, 2015, Vijay Saraswat <vi...@saraswat.org> wrote:
>>>
>>>> Available in GML --
>>>>
>>>>
>>>> http://x10-lang.org/x10-community/applications/global-matrix-library.html
>>>>
>>>> We are exploring how to make it available within Spark. Any ideas would
>>>> be much appreciated.
>>>>
>>>> On 2/27/15 7:01 AM, shahab wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I just wonder if there is any Sparse Matrix implementation available
>>>>> in Spark, so it can be used in spark application?
>>>>>
>>>>> best,
>>>>> /Shahab
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>
>
>

Re: Is there any Sparse Matrix implementation in Spark/MLib?

Posted by Vijay Saraswat <vi...@saraswat.org>.
GML is a fast, distributed, in-memory sparse (and dense) matrix libraries.

It does not use RDDs for resilience. Instead we have examples that use 
Resilient X10 (which provides recovery of distributed control structures 
in case of node failure) and Hazelcast for stable storage.

We are looking to benchmark with RDDs to compare overhead, and also 
looking to see how the same ideas could be realized on top of RDDs.


On 2/28/15 7:25 PM, Joseph Bradley wrote:
> Hi Shahab,
>
> There are actually a few distributed Matrix types which support sparse 
> representations: RowMatrix, IndexedRowMatrix, and CoordinateMatrix.  
> The documentation has a bit more info about the various uses: 
> http://spark.apache.org/docs/latest/mllib-data-types.html#distributed-matrix 
>
>
> The Spark 1.3 RC includes a new one: BlockMatrix.
>
> But since these are distributed, they are represented using RDDs, so 
> they of course will not be as fast as computations on smaller, locally 
> stored matrices.
>
> Joseph
>
> On Fri, Feb 27, 2015 at 4:39 AM, Ritesh Kumar Singh 
> <riteshoneinamillion@gmail.com <ma...@gmail.com>> 
> wrote:
>
>     try using breeze (scala linear algebra library)
>
>     On Fri, Feb 27, 2015 at 5:56 PM, shahab <shahab.mokari@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Thanks a lot Vijay, let me see how it performs.
>
>         Best
>         Shahab
>
>
>         On Friday, February 27, 2015, Vijay Saraswat
>         <vijay@saraswat.org <ma...@saraswat.org>> wrote:
>
>             Available in GML --
>
>             http://x10-lang.org/x10-community/applications/global-matrix-library.html
>
>             We are exploring how to make it available within Spark.
>             Any ideas would be much appreciated.
>
>             On 2/27/15 7:01 AM, shahab wrote:
>
>                 Hi,
>
>                 I just wonder if there is any Sparse Matrix
>                 implementation available  in Spark, so it can be used
>                 in spark application?
>
>                 best,
>                 /Shahab
>
>
>
>             ---------------------------------------------------------------------
>             To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>             For additional commands, e-mail: user-help@spark.apache.org
>
>
>