You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sameer Tilak <ss...@live.com> on 2014/09/15 20:28:39 UTC
MLLib sparse vector
Hi All,I have transformed the data into following format: First column is user id, and then all the other columns are class ids. For a user only class ids that appear in this row have value 1 and others are 0. I need to crease a sparse vector from this. Does the API for creating a sparse vector that can directly support this format?
User id Product class ids
2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806 183576 3286 51715 57671 57476
Re: MLLib sparse vector
Posted by Chris Gore <cd...@cdgore.com>.
Probably worth noting that the factory methods in mllib create an object of type org.apache.spark.mllib.linalg.Vector which stores data in a similar format as Breeze vectors
Chris
On Sep 15, 2014, at 3:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
> Or you can use the factory method `Vectors.sparse`:
>
> val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))
>
> where numProducts should be the largest product id plus one.
>
> Best,
> Xiangrui
>
> On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore <cd...@cdgore.com> wrote:
>> Hi Sameer,
>>
>> MLLib uses Breeze’s vector format under the hood. You can use that.
>> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>>
>> For example:
>>
>> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>>
>> val numClasses = classes.distinct.count.toInt
>>
>> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
>> BSV[Double](x.classIDs.sortWith(_ < _),
>> Seq.fill(x.classIDs.length)(1.0).toArray,
>> numClasses).asInstanceOf[BV[Double]]))
>>
>> Chris
>>
>> On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ss...@live.com> wrote:
>>
>> Hi All,
>> I have transformed the data into following format: First column is user id,
>> and then all the other columns are class ids. For a user only class ids that
>> appear in this row have value 1 and others are 0. I need to crease a sparse
>> vector from this. Does the API for creating a sparse vector that can
>> directly support this format?
>>
>> User id Product class ids
>>
>> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
>> 183576 3286 51715 57671 57476
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: MLLib sparse vector
Posted by Xiangrui Meng <me...@gmail.com>.
Or you can use the factory method `Vectors.sparse`:
val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))
where numProducts should be the largest product id plus one.
Best,
Xiangrui
On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore <cd...@cdgore.com> wrote:
> Hi Sameer,
>
> MLLib uses Breeze’s vector format under the hood. You can use that.
> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>
> For example:
>
> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>
> val numClasses = classes.distinct.count.toInt
>
> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
> BSV[Double](x.classIDs.sortWith(_ < _),
> Seq.fill(x.classIDs.length)(1.0).toArray,
> numClasses).asInstanceOf[BV[Double]]))
>
> Chris
>
> On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ss...@live.com> wrote:
>
> Hi All,
> I have transformed the data into following format: First column is user id,
> and then all the other columns are class ids. For a user only class ids that
> appear in this row have value 1 and others are 0. I need to crease a sparse
> vector from this. Does the API for creating a sparse vector that can
> directly support this format?
>
> User id Product class ids
>
> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
> 183576 3286 51715 57671 57476
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: MLLib sparse vector
Posted by Chris Gore <cd...@cdgore.com>.
Hi Sameer,
MLLib uses Breeze’s vector format under the hood. You can use that. http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
For example:
import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
val numClasses = classes.distinct.count.toInt
val userWithClassesAsSparseVector = rows.map(x => (x.userID, new BSV[Double](x.classIDs.sortWith(_ < _), Seq.fill(x.classIDs.length)(1.0).toArray, numClasses).asInstanceOf[BV[Double]]))
Chris
On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ss...@live.com> wrote:
> Hi All,
> I have transformed the data into following format: First column is user id, and then all the other columns are class ids. For a user only class ids that appear in this row have value 1 and others are 0. I need to crease a sparse vector from this. Does the API for creating a sparse vector that can directly support this format?
>
> User id Product class ids
>
> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806 183576 3286 51715 57671 57476