You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Andrejs Abele <an...@sindicetech.com> on 2014/10/30 12:38:01 UTC

Getting vector values

Hi,

I'm new to Mllib and spark.  I'm trying to use tf-idf and use those values
for term ranking.
I'm getting tf values in vector format, but how can get the values of
vector?

 val sc = new SparkContext(conf)
     val documents: RDD[Seq[String]] =
sc.textFile("/home/andrejs/Datasets/dbpedia/test.txt").map(_.split("
").toSeq)
   documents.foreach(println(_))
   val hashingTF = new HashingTF()
   val tf: RDD[Vector] = hashingTF.transform(documents)

   tf.foreach(println(_))

My output is :
WrappedArray(a, a, b, c)
WrappedArray(e, a, c, d)

(1048576,[97,99,100,101],[1.0,1.0,1.0,1.0])
(1048576,[97,98,99],[2.0,1.0,1.0])

How can I get  [97,99,100,101] out, and [1.0,1.0,1.0,1.0] ?
And how can I map that 100 = 1.0  ?

Some help is greatly appreciated,

Andrejs

Re: Getting vector values

Posted by Sean Owen <so...@cloudera.com>.

Call toArray on the Vector and print that, or toBreeze.
On Oct 30, 2014 12:38 PM, "Andrejs Abele" <an...@sindicetech.com> wrote:

> Hi,
>
> I'm new to Mllib and spark.  I'm trying to use tf-idf and use those values
> for term ranking.
> I'm getting tf values in vector format, but how can get the values of
> vector?
>
>  val sc = new SparkContext(conf)
>      val documents: RDD[Seq[String]] =
> sc.textFile("/home/andrejs/Datasets/dbpedia/test.txt").map(_.split("
> ").toSeq)
>    documents.foreach(println(_))
>    val hashingTF = new HashingTF()
>    val tf: RDD[Vector] = hashingTF.transform(documents)
>
>    tf.foreach(println(_))
>
> My output is :
> WrappedArray(a, a, b, c)
> WrappedArray(e, a, c, d)
>
> (1048576,[97,99,100,101],[1.0,1.0,1.0,1.0])
> (1048576,[97,98,99],[2.0,1.0,1.0])
>
> How can I get  [97,99,100,101] out, and [1.0,1.0,1.0,1.0] ?
> And how can I map that 100 = 1.0  ?
>
> Some help is greatly appreciated,
>
> Andrejs
>