You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Debasish Das <de...@gmail.com> on 2014/05/11 05:35:21 UTC
LabeledPoint toString to dump LibSvm if SparseVector
Hi,
I need to change the toString on LabeledPoint to libsvm format so that I
can dump RDD[LabeledPoint] as a format that could be read by sparse
glmnet-R and other packages to benchmark mllib classification accuracy...
Basically I have to change the toString of LabeledPoint and toString of
SparseVector....
Should I add it as a PR or is it already being added ?
I added these functions toLibSvm in my internal util class for now...
def toLibSvm(labelPoint: LabeledPoint): String = {
labelPoint.label.toString + " " + toLibSvm(labelPoint.features
.asInstanceOf[SparseVector])
}
def toLibSvm(features: SparseVector): String = {
val indices = features.indices
val values = features.values
indices.zip(values).mkString(" ").replace(',', ':').replace("(", ""
).replace(")","")
}
Thanks.
Deb
On Fri, May 9, 2014 at 10:09 PM, mateiz <gi...@git.apache.org> wrote:
> Github user mateiz commented on a diff in the pull request:
>
> https://github.com/apache/spark/pull/685#discussion_r12502569
>
> --- Diff:
> mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
> @@ -100,4 +100,27 @@ class VectorsSuite extends FunSuite {
> assert(vec2(6) === 4.0)
> assert(vec2(7) === 0.0)
> }
> +
> + test("parse vectors") {
> + val vectors = Seq(
> + Vectors.dense(Array.empty[Double]),
> + Vectors.dense(1.0),
> + Vectors.dense(1.0, 0.0, -2.0),
> + Vectors.sparse(0, Array.empty[Int], Array.empty[Double]),
> + Vectors.sparse(1, Array(0), Array(1.0)),
> + Vectors.sparse(3, Array(0, 2), Array(1.0, -2.0)))
> + vectors.foreach { v =>
> + val v1 = Vectors.parse(v.toString)
> + assert(v.getClass === v1.getClass)
> + assert(v === v1)
> + }
> +
> + val malformatted = Seq("1", "[1,,]", "[1,2", "(1,[1,2])",
> "(1,[1],[2.0,1.0])")
> + malformatted.foreach { s =>
> + intercept[RuntimeException] {
> --- End diff --
>
> Should be Exception instead
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>