You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Debasish Das <de...@gmail.com> on 2014/05/11 18:40:10 UTC

LabeledPoint dump LibSVM if SparseVector

Hi,

I need to change the toString on LabeledPoint to libsvm format so that I
can dump RDD[LabeledPoint] as a format that could be read by sparse
glmnet-R and other packages to benchmark mllib classification accuracy...

Basically I have to change the toString of LabeledPoint and toString of
SparseVector....

Should I add it as a PR or is it already being added ?

I added these functions toLibSvm in my internal util class for now...

def toLibSvm(labelPoint: LabeledPoint): String = {

    labelPoint.label.toString + " " +
toLibSvm(labelPoint.features.asInstanceOf[SparseVector])

  }

  def toLibSvm(features: SparseVector): String = {

    val indices = features.indices

    val values = features.values

    indices.zip(values).mkString("
").replace(',', ':').replace("(", "").replace(")","")

  }
Thanks.
Deb

Re: LabeledPoint dump LibSVM if SparseVector

Posted by Xiangrui Meng <me...@gmail.com>.
Hi Deb,

There is a saveAsLibSVMFile in MLUtils now. Also, I submitted a PR for
standardizing text format of vectors and labeled point:
https://github.com/apache/spark/pull/685

Best,
Xiangrui

On Sun, May 11, 2014 at 9:40 AM, Debasish Das <de...@gmail.com> wrote:
> Hi,
>
> I need to change the toString on LabeledPoint to libsvm format so that I
> can dump RDD[LabeledPoint] as a format that could be read by sparse
> glmnet-R and other packages to benchmark mllib classification accuracy...
>
> Basically I have to change the toString of LabeledPoint and toString of
> SparseVector....
>
> Should I add it as a PR or is it already being added ?
>
> I added these functions toLibSvm in my internal util class for now...
>
> def toLibSvm(labelPoint: LabeledPoint): String = {
>
>     labelPoint.label.toString + " " +
> toLibSvm(labelPoint.features.asInstanceOf[SparseVector])
>
>   }
>
>   def toLibSvm(features: SparseVector): String = {
>
>     val indices = features.indices
>
>     val values = features.values
>
>     indices.zip(values).mkString("
> ").replace(',', ':').replace("(", "").replace(")","")
>
>   }
> Thanks.
> Deb