You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sameer Tilak <ss...@live.com> on 2014/09/15 20:28:39 UTC

MLLib sparse vector

Hi All,I have transformed the data into following format: First column is user id, and then all the other columns are class ids. For a user only class ids that appear in this row have value 1 and others are 0.  I need to crease a sparse vector from this. Does the API for creating a sparse vector that can directly support this format?  
User id    Product class ids
2622572	145447	1620	13421	28565	285556	293	4553	67261	130	3646	1671	18806	183576	3286	51715	57671	57476 		 	   		  

Re: MLLib sparse vector

Posted by Chris Gore <cd...@cdgore.com>.
Probably worth noting that the factory methods in mllib create an object of type org.apache.spark.mllib.linalg.Vector which stores data in a similar format as Breeze vectors

Chris

On Sep 15, 2014, at 3:24 PM, Xiangrui Meng <me...@gmail.com> wrote:

> Or you can use the factory method `Vectors.sparse`:
> 
> val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))
> 
> where numProducts should be the largest product id plus one.
> 
> Best,
> Xiangrui
> 
> On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore <cd...@cdgore.com> wrote:
>> Hi Sameer,
>> 
>> MLLib uses Breeze’s vector format under the hood.  You can use that.
>> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>> 
>> For example:
>> 
>> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>> 
>> val numClasses = classes.distinct.count.toInt
>> 
>> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
>> BSV[Double](x.classIDs.sortWith(_ < _),
>> Seq.fill(x.classIDs.length)(1.0).toArray,
>> numClasses).asInstanceOf[BV[Double]]))
>> 
>> Chris
>> 
>> On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ss...@live.com> wrote:
>> 
>> Hi All,
>> I have transformed the data into following format: First column is user id,
>> and then all the other columns are class ids. For a user only class ids that
>> appear in this row have value 1 and others are 0.  I need to crease a sparse
>> vector from this. Does the API for creating a sparse vector that can
>> directly support this format?
>> 
>> User id    Product class ids
>> 
>> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
>> 183576 3286 51715 57671 57476
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: MLLib sparse vector

Posted by Xiangrui Meng <me...@gmail.com>.
Or you can use the factory method `Vectors.sparse`:

val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))

where numProducts should be the largest product id plus one.

Best,
Xiangrui

On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore <cd...@cdgore.com> wrote:
> Hi Sameer,
>
> MLLib uses Breeze’s vector format under the hood.  You can use that.
> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>
> For example:
>
> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>
> val numClasses = classes.distinct.count.toInt
>
> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
> BSV[Double](x.classIDs.sortWith(_ < _),
> Seq.fill(x.classIDs.length)(1.0).toArray,
> numClasses).asInstanceOf[BV[Double]]))
>
> Chris
>
> On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ss...@live.com> wrote:
>
> Hi All,
> I have transformed the data into following format: First column is user id,
> and then all the other columns are class ids. For a user only class ids that
> appear in this row have value 1 and others are 0.  I need to crease a sparse
> vector from this. Does the API for creating a sparse vector that can
> directly support this format?
>
> User id    Product class ids
>
> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
> 183576 3286 51715 57671 57476
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: MLLib sparse vector

Posted by Chris Gore <cd...@cdgore.com>.
Hi Sameer,

MLLib uses Breeze’s vector format under the hood.  You can use that.  http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector

For example:

import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}

val numClasses = classes.distinct.count.toInt

val userWithClassesAsSparseVector = rows.map(x => (x.userID, new BSV[Double](x.classIDs.sortWith(_ < _), Seq.fill(x.classIDs.length)(1.0).toArray, numClasses).asInstanceOf[BV[Double]]))

Chris

On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ss...@live.com> wrote:

> Hi All,
> I have transformed the data into following format: First column is user id, and then all the other columns are class ids. For a user only class ids that appear in this row have value 1 and others are 0.  I need to crease a sparse vector from this. Does the API for creating a sparse vector that can directly support this format?  
> 
> User id    Product class ids
> 
> 2622572	145447	1620	13421	28565	285556	293	4553	67261	130	3646	1671	18806	183576	3286	51715	57671	57476