You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Debasish Das <de...@gmail.com> on 2014/05/11 05:35:21 UTC
LabeledPoint toString to dump LibSvm if SparseVector

Hi,

I need to change the toString on LabeledPoint to libsvm format so that I
can dump RDD[LabeledPoint] as a format that could be read by sparse
glmnet-R and other packages to benchmark mllib classification accuracy...

Basically I have to change the toString of LabeledPoint and toString of
SparseVector....

Should I add it as a PR or is it already being added ?

I added these functions toLibSvm in my internal util class for now...

def toLibSvm(labelPoint: LabeledPoint): String = {

    labelPoint.label.toString + " " + toLibSvm(labelPoint.features
.asInstanceOf[SparseVector])

  }

  def toLibSvm(features: SparseVector): String = {

    val indices = features.indices

    val values = features.values

    indices.zip(values).mkString(" ").replace(',', ':').replace("(", ""
).replace(")","")

  }
Thanks.
Deb



On Fri, May 9, 2014 at 10:09 PM, mateiz <gi...@git.apache.org> wrote:

> Github user mateiz commented on a diff in the pull request:
>
>     https://github.com/apache/spark/pull/685#discussion_r12502569
>
>     --- Diff:
> mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
>     @@ -100,4 +100,27 @@ class VectorsSuite extends FunSuite {
>          assert(vec2(6) === 4.0)
>          assert(vec2(7) === 0.0)
>        }
>     +
>     +  test("parse vectors") {
>     +    val vectors = Seq(
>     +      Vectors.dense(Array.empty[Double]),
>     +      Vectors.dense(1.0),
>     +      Vectors.dense(1.0, 0.0, -2.0),
>     +      Vectors.sparse(0, Array.empty[Int], Array.empty[Double]),
>     +      Vectors.sparse(1, Array(0), Array(1.0)),
>     +      Vectors.sparse(3, Array(0, 2), Array(1.0, -2.0)))
>     +    vectors.foreach { v =>
>     +      val v1 = Vectors.parse(v.toString)
>     +      assert(v.getClass === v1.getClass)
>     +      assert(v === v1)
>     +    }
>     +
>     +    val malformatted = Seq("1", "[1,,]", "[1,2", "(1,[1,2])",
> "(1,[1],[2.0,1.0])")
>     +    malformatted.foreach { s =>
>     +      intercept[RuntimeException] {
>     --- End diff --
>
>     Should be Exception instead
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>