You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Zeming Yu <ze...@gmail.com> on 2017/04/25 00:36:12 UTC

pyspark vector

Hi all,

Beginner question:

what does the 3 mean in the (3,[0,1,2],[1.0,1.0,1.0])?

https://spark.apache.org/docs/2.1.0/ml-features.html

 id | texts                           | vector
----|---------------------------------|---------------
 0  | Array("a", "b", "c")            | (3,[0,1,2],[1.0,1.0,1.0])
 1  | Array("a", "b", "b", "c", "a")  | (3,[0,1,2],[2.0,2.0,1.0])

Re: pyspark vector

Posted by Nick Pentreath <ni...@gmail.com>.
Well the 3 in this case is the size of the sparse vector. This equates to
the number of features, which for CountVectorizer (I assume that's what
you're using) is also vocab size (number of unique terms).

On Tue, 25 Apr 2017 at 04:06 Peyman Mohajerian <mo...@gmail.com> wrote:

> setVocabSize
>
>
> On Mon, Apr 24, 2017 at 5:36 PM, Zeming Yu <ze...@gmail.com> wrote:
>
>> Hi all,
>>
>> Beginner question:
>>
>> what does the 3 mean in the (3,[0,1,2],[1.0,1.0,1.0])?
>>
>> https://spark.apache.org/docs/2.1.0/ml-features.html
>>
>>  id | texts                           | vector
>> ----|---------------------------------|---------------
>>  0  | Array("a", "b", "c")            | (3,[0,1,2],[1.0,1.0,1.0])
>>  1  | Array("a", "b", "b", "c", "a")  | (3,[0,1,2],[2.0,2.0,1.0])
>>
>>
>

Re: pyspark vector

Posted by Peyman Mohajerian <mo...@gmail.com>.
setVocabSize


On Mon, Apr 24, 2017 at 5:36 PM, Zeming Yu <ze...@gmail.com> wrote:

> Hi all,
>
> Beginner question:
>
> what does the 3 mean in the (3,[0,1,2],[1.0,1.0,1.0])?
>
> https://spark.apache.org/docs/2.1.0/ml-features.html
>
>  id | texts                           | vector
> ----|---------------------------------|---------------
>  0  | Array("a", "b", "c")            | (3,[0,1,2],[1.0,1.0,1.0])
>  1  | Array("a", "b", "b", "c", "a")  | (3,[0,1,2],[2.0,2.0,1.0])
>
>